
# Transactions and replication

In [None]:
!docker container ls -a

In [None]:
%%bash
docker run \
--rm \
--name my_mysql \
-v $(pwd)/mysql_databasefiles:/var/lib/mysql \
-p 3306:3306 \
-e MYSQL_ROOT_PASSWORD=deterentysker!42snapsnap \
-d \
mysql
echo "MySQLRunning"

In [None]:
import sys
import mysql.connector
import pandas as pd
from IPython.display import display


def rootconnect():
    try:
        pw = 'deterentysker!42snapsnap'
        db = 'transactionDemo'
        conn = mysql.connector.connect( host='localhost', database=db ,user='root', password=pw)
        conn.autocommit = True
        return conn;
    except Exception as ex:
        print(str(ex), file=sys.stderr)
    

conn = rootconnect()

def sqlQuery(sqlString):
    global conn
    try:
        if not conn.is_connected():
            conn = rootconnect()
            
        df = pd.read_sql(sqlString, conn)
        return df
    except Exception as ex:
        print(str(ex), file=sys.stderr)

def query(sqlString):
    display(sqlQuery(sqlString))

def sqlDo(sqlString):
    global conn
    try:
        if not conn.is_connected():
            conn = rootconnect()
        cursor = conn.cursor()
        cursor.execute(sqlString)
        res = cursor.fetchwarnings()
        return res
    except Exception as ex:
        print(str(ex), file=sys.stderr)
    finally:    
        cursor.close()

"Done"    

In [None]:
query("""
SELECT * FROM account;
""")

query("""
select sum(balance) as total 
from account
""")

In pure SQL a transaction to move 500 from account 3 to account 4 might look like:
```mysql
START TRANSACTION;

UPDATE account
	set balance = balance + 500 where id = 4;
    
UPDATE account
	set balance = balance - 500 where id = 3;

COMMIT;
```

# Your turn
* are two different "tabs" in mysqlworkbench in the same or in different transactions?

* Connect two workbenches to the same database, 
* start a transaction, make some changes, and 
* verify that you cannot see the changes in the other workbench until after commit.

* Is hiding the changes always the best way to do things?

# ACID principle for transactions
* Atomicity (either all of nothing of the transaction is done)
* Consistency (all integrity constrainsts must be upheld after the transaction - e.g. foreign key constraints)
* Isolation (see below/later)
* Durability (committed transactions are permanent in the face of (power) failure)

# Implementation of transactions in DB
In short...

![](images/complex.jpg)
### Done in many different complex amazing and awesome ways

# Isolation inside a transaction
The "I" in ACID is *isolation*. It means that from your point of view, you are the only one making changes in the database (in this transaction).

Before we see how to obtain this, we need to see what can go wrong if we do not have isolation.

Consider this very simple table:

<img src="images/isolation0.png" width="15%">

<sub><sub>next slides based on figures from "https://en.wikipedia.org/wiki/Isolation_(database_systems)"</sub></sub>

# Problem 1: Dirty read
![](images/isolation1.png)

# Non-repeatable reads
![](images/isolation2.png)

Dirty and non-repeatable both cause the classic account update error

# Phantom reads
![](images/isolation3.png)

# Isolation levels: Serializability / Snapshot isolation

* all reads made *in **a** transaction* will see a consistent snapshot of the database
* the transaction will successfully commit only if
    * no updates it has made conflict with any concurrent updates made since that snapshot.




<img src="images/saywhat.jpg" width="20%">

# Definition of isolation
Assume we have a snapshot $S_0$ of the database at some point in time $t_0$.

The state of the database at some time $t_n$ later than $t_0$ is:

$S_0 · T_0 · T_1 · T_2 · ... · T_n$

That is - the effect of applying the sequence of transactions $T_0$ until $T_n$ at the state at $S_0$. 
<br><sub>(remember that single updates are considered transactions as well)</sub>

## Isolation of transaction
If $S_i · T_a . T_b = S_i · T_b · T_a$ we say $T_a$ and $T_b$ are *isolated*

## Logging and dirty pages
One way to implement transactions is to:
* Snapshot at the start of the transaction
* then for every statement in the transaction
    * log the statement
    * execute the statement in a local version of the snapshot
    * (This is done by marking the pages as "dirty")
* then the transaction is done, check if it was isolated
    * and if so, update all the dirty pages, and write the transaction to the transaction log
    * if not, try again, or try a different scheme for isolation

# Locking (exclusive rights)
Before the transaction:
* Lock the whole schema
* or lock the whole table
* or lock the columns
* or lock the rows

needed in the transaction.

Two ways to lock rows in mysql:

```mysql
SELECT c1 FROM t WHERE c1 = 10 FOR UPDATE
```
or
```mysql
SELECT c1 FROM t WHERE c1 BETWEEN 10 and 20 FOR UPDATE
```

# Deadlocks
If you try to lock an object which is already locked, you are waiting in queue for the other to unlock first

| Transaction 1 | Transaction 2|
|:-------------:|:------------:|
|LOCK TABLES A WRITE | LOCK TABLES B WRITE |
|LOCK TABLES B WRITE | LOCK TABLES A WRITE |
|do stuff in A and B | do stuff in A and B |
|UNLOCK TABLES | UNLOCK TABLES | 


# Durability (recoverbility)
The combination of snapshots and logging allows us to recover the database in case of failure of the database.

It uses the formula we saw earlier: $S_0 · T_0 · T_1 · T_2 · ... · T_n$

# Consistency 
It is basically up to the application programmer to ensure consistency.

The database has some tools that help in this:
* Integrity constraints
* Foreign keys (cascade, delete, abort, ignore)
* Triggers (to call stored procedures as side effect to updates)
* ...

#### transaction programmer:
Assume: all consistency rules are true before the transaction starts
<br>
Responsibility: Make sure all consistency rules are true after the transaction finishes

# Your turn
Considder the situation where you have to add a new order to classicmodels. 
* Which tables need to be updated?
* Are plain transactions sufficient, or do you actually need to lock any elements?
* What are the durability expectations to this action
* Are there any concerns regarding isolation

# BREAK

# Rest of semester
* Relational algebra 
* Spacial databases
* The inner working of an ORM
* Graph databases (Neo4J)
* Larger project 

# Replication
* Scale-out
* Data security
* Fail-over
* Analytics
* Long-distance data distribution

# How replication is done in MySQL
1. Synchronize
2. Transaction log is sendt as a stream from "master" to "slave"

![](https://lh3.ggpht.com/_41A-R4AR9qM/TOU5py-aVjI/AAAAAAAAAm0/YdthQoPQxRg/s800/postgres01.jpg)


### Notice Master and slave...

# Replication in Mongo
### A recovery scenario

| all well | fail | recovery |
|:----:|:----:|:-----:|
|![](images/mongo_rep1.png)|![](images/mongo_rep2.png)|![](images/mongo_rep3.png)|


# Consistency, Availability and Partition tolerance
* Consistency: Every read receives the most recent write or an error
* Availability: Every request receives a (non-error) response – without the guarantee that it contains the most recent write
* Partition tolerance: The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes

Notice: CAP is only relevant if you distribute your database

# CAP Theorem: you can only guarantee two of 
* Consistency, 
* Availability and 
* Partition tolerance

# Assignment