# Understanding Transaction, Commit and Rollback

**What is a Transaction?**
* A group of related operations that must succeed or fail together, to maintain consistency in a database.
* A transaction is like a package of database operations that belong together. Think of it as a to-do list where either everything gets done, or nothing does.
* For example, when you buy something online, the transaction might include: checking your balance, deducting money from your account, and updating the store's inventory. These steps must all succeed together for the purchase to be valid.

---

**What is a Commit?**
* A commit is like pressing the "Save" button on your changes.
* When you commit a transaction, you're telling the database, "I'm sure about all these changes - make them permanent".
* It's similar to signing a contract - once you commit, the changes are officially recorded and can't be easily undone.

---

**What is a Rollback?**
* A rollback is like an "undo" button for your transaction.
* If anything goes wrong during your transaction (like a network error or insufficient funds), a rollback returns everything to how it was before the transaction started.
* It's similar to erasing everything you wrote on a page and starting fresh.

---

**What is a Savepoint?**

* A SAVEPOINT is like placing multiple bookmarks while writing an important document.
* If you make a mistake, instead of erasing everything and starting over, you can go back to the last bookmark and continue from there.
* It allows partial rollback to a savepoint within the transaction instead of rolling back everything.
* A SAVEPOINT creates a checkpoint in a transaction, allowing rollback to that specific point instead of discarding all changes.


---

**EXAMPLE: FULL ROLLBACK**
* You are transferring money from your bank account to a friend’s account.
* The process involves deducting money from Account A and adding it to Account B.
* After deducting money from your account, the system crashes before adding it to your friend’s account.
* If adding money to Account B fails due to a system error, the deduction from Account A should be rolled back to maintain consistency.

```sql
BEGIN TRANSACTION;
SAVEPOINT before_deduction;
UPDATE Accounts SET Balance = Balance - 500 WHERE AccountID = A;
UPDATE Accounts SET Balance = Balance + 500 WHERE AccountID = B;
IF @@ERROR <> 0 THEN -- if a system error occurs
ROLLBACK TO before_deduction;
ROLLBACK;
RETURN;
END IF;
COMMIT;
```


**EXAMPLE: PARTIAL ROLLBACK**
```sql

-- Create a sample table if it does not exist
CREATE TABLE IF NOT EXISTS Employees (
    EmpID INT,
    Name VARCHAR(50),
    Salary DECIMAL(10, 2)
);

-- Start a transaction
BEGIN TRANSACTION;

-- Insert a new employee
INSERT INTO Employees (EmpID, Name, Salary) VALUES (1, 'John Doe', 50000);

-- Create a savepoint
SAVEPOINT sp1;

-- Insert another employee
INSERT INTO Employees (EmpID, Name, Salary) VALUES (2, 'Jane Smith', 60000);

-- Create another savepoint
SAVEPOINT sp2;

-- Insert another employee
INSERT INTO Employees (EmpID, Name, Salary) VALUES (3, 'Jordan', 70000);

-- Create another savepoint
SAVEPOINT sp3;

-- Insert another employee
-- Assume that this record was repeatedly entred by mistake (as this already exists)
INSERT INTO Employees (EmpID, Name, Salary) VALUES (1, 'John Doe', 50000);

SELECT * FROM Employees;

-- Since the last INSERT was a mistake, rollback to the last savepoint (sp3)
ROLLBACK TO sp3;

.print "After rollback the transaction is removed"

-- Now commit the transaction
COMMIT;

-- Verify the data
SELECT * FROM Employees;

```



# ACID Properties

ACID properties are like a set of promises your database makes to keep your data safe and reliable, even when things go wrong. Think of them as safety rules that ensure your data stays accurate and consistent, just like safety protocols in a bank ensure your money is handled correctly.

**What does ACID stand for?**

* **A - Atomicity**:
    * All or nothing - like a pass code where every digit must be correct. Either the entire operation completes successfully, or none of it does.
    * Atomicity ensures that either all parts of a transaction succeed or none do.
    * If the system crashes or an error occurs before all operations are completed & committed, it will roll back the entire transaction **fully** (all operations) or **partially** (to a specific `SAVEPOINT`).

* **C - Consistency**:
    * Your data should always follow the rules.
    * Consistency ensures that data always follows business rules and constraints.
    * For example,
        * If a discount is applied, the system must enforce the rule that the total price cannot be negative, ensuring that the database remains in a valid state.
        * A bank account can't go below zero if your account type doesn't allow overdrafts

* **I - Isolation**:
    * Isolation ensures that multiple transactions do not interfere with each other.
    * Even if two customers try to book the same seat, one transaction will complete first, locking the seat, while the other transaction will fail or be retried, preventing double booking.

* **D - Durability**:
    * Durability guarantees that once a transaction is committed, its changes are permanently stored in the database.
    * Even if the system crashes or there is a power failure, the committed data will not be lost. This is typically achieved using logs, backups, and stable storage mechanisms.
    * For example, if you successfully book a flight ticket and the system crashes right after, your booking should still be saved when the system restarts.

**Why do we need ACID?**

Real-World Importance:
* **Critical Systems**: Medical records, banking transactions, and flight bookings can't afford data inconsistencies
* **Multiple Users**: Modern applications serve thousands of users simultaneously - their actions must not conflict
* **System Failures**: Power outages, network issues, or hardware failures shouldn't corrupt or lose our data

# Locking Mechanism

Locking mechanisms are crucial for maintaining data consistency and integrity in multi-user database environments. 
* They prevent concurrent transactions from interfering with each other and ensure that data is accessed and modified in a predictable and reliable manner. 
* Two common types of locks are **row-level locks** and **table-level locks**.

## Row-Level Locks

**Row-level locks** are the most granular type of lock. 
* They restrict access to individual rows within a table.
* This allows multiple transactions to concurrently access and modify different rows of the same table without blocking each other.
* This leads to higher concurrency and improved performance, especially in situations with frequent updates to different rows.

**Advantages of Row-Level Locks**:
* **High Concurrency**: Multiple transactions can operate on different rows of the same table concurrently.
* **Reduced Contention**: Less blocking and waiting time for transactions.
* **Improved Performance**: Faster transaction processing due to increased concurrency.

**Disadvantages of Row-Level Locks**:
*  **Increased Overhead**: Managing a large number of row-level locks can consume significant system resources.
* **Potential for Deadlocks**: Deadlocks can occur if two or more transactions are waiting for each other to release locks on rows.

**Example Scenario**:
* Imagine two users, Alice and Bob, are updating customer information in a database.
* Alice is updating customer ID 123, while Bob is updating customer ID 456.
* With row-level locking, Alice's update on row 123 won't block Bob's update on row 456, and vice versa.


## Table-Level Locks

**Table-level locks** are less granular than row-level locks. 
* When a transaction acquires a table-level lock, it locks the entire table, preventing other transactions from accessing or modifying any row in that table.
* Table-level locks are simpler to manage than row-level locks, but they can significantly reduce concurrency.

**Advantages of Table-Level Locks**:
* **Simpler Management**: Fewer locks to manage compared to row-level locks.
* **Lower Overhead**: Less system resources are required to manage table-level locks.

**Disadvantages of Table-Level Locks**:
* **Reduced Concurrency**: Only one transaction can access the table at a time.
* **Increased Contention**: Transactions may have to wait longer for the table to become available.
* **Poorer Performance**: Slower transaction processing due to reduced concurrency.

**Example Scenario**:
* Imagine a user is running a batch update that modifies a large number of rows in the customer table.
* A table-level lock would be acquired to prevent other users from accessing or modifying the customer table during the batch update.
* This ensures the consistency of the data during the update, but also means other users will have to wait.

## Locking Questions

A banking system allows multiple users to update their account balances simultaneously. However, to ensure consistency, it uses a locking mechanism. 

**Which type of lock would allow two different users to update different accounts at the same time without blocking each other?**

A **Row-Level Lock** ensures that each transaction locks only the specific row it needs to modify, allowing different users to update different rows simultaneously. This improves concurrency and reduces contention, making it ideal for systems with frequent updates to different records, such as banking transactions.


# Deadlock

A **deadlock** occurs when two or more transactions are blocked indefinitely, waiting for each other to release the resources (locks) that they need. 
* Imagine two trains approaching a single-track bridge from opposite directions.
* Neither can proceed until the other backs up, but neither wants to back up.
* This is analogous to a deadlock in a database.

**Conditions for Deadlock**

Four necessary conditions must hold for a deadlock to occur:
* **Mutual Exclusion**: A resource can only be held by one transaction at a time.
* **Hold and Wait**: A transaction holding a resource can request additional resources.
* **No Preemption**: Resources cannot be forcibly taken away from a transaction; they must be released voluntarily.
* **Circular Wait**: A set of transactions exists such that each transaction is waiting for a resource held by another transaction in the set.

**Example Scenario**:
* Transaction A holds a lock on row 1 and is waiting for a lock on row 2.
* Transaction B holds a lock on row 2 and is waiting for a lock on row 1.
* Neither transaction can proceed, resulting in a deadlock.

**Handling Deadlocks**

Databases employ various strategies to handle deadlocks:
* **Deadlock Detection**: The database periodically checks for circular wait conditions.
* **Deadlock Prevention**: Strategies that prevent the four necessary conditions from occurring (e.g., ordering resource requests, limiting hold and wait).
* **Deadlock Avoidance**: Carefully allocating resources to prevent the possibility of deadlocks.
* **Deadlock Resolution (Victim Selection)**: Choosing one or more "victim" transactions to be rolled back, releasing their locks and allowing other transactions to proceed. The choice of victim is often based on factors like the transaction's duration, importance, or the amount of work already done.

# Isolation Levels

**Isolation levels control** the degree to which transactions are isolated from each other. 
* They define the visibility of changes made by one transaction to other concurrent transactions.
* Higher isolation levels provide greater consistency but can reduce concurrency.
* Lower isolation levels increase concurrency but may lead to various concurrency anomalies.

**Common Isolation Levels (ANSI SQL Standard)**
* **Read Uncommitted**:
    * The lowest level.
    * Transactions can read uncommitted changes made by other transactions (dirty reads).
    * This level offers the highest concurrency but is most prone to inconsistencies.
* **Read Committed**:
    * Transactions can only read committed changes made by other transactions.
    * Dirty reads are prevented, but non-repeatable reads and phantom reads can still occur.
* **Repeatable Read**:
    * Transactions are guaranteed to see the same data throughout their execution.
    * Non-repeatable reads are prevented, but phantom reads can still occur.
* **Serializable**:
    * The highest level. Transactions appear to execute as if they were running one at a time (serially).
    * This level provides the highest consistency but can significantly reduce concurrency.

**Concurrency Anomalies**
* **Dirty Read**: Reading uncommitted data.
* **Non-Repeatable Read**: Reading the same row twice within a transaction, and the row's value is changed by another transaction in between.
* **Phantom Read**: A transaction reads a set of rows based on a condition. Another transaction inserts or deletes rows that match the condition, causing the first transaction to see a different set of rows if it repeats the read.

**Choosing an Isolation Level**
* The appropriate isolation level depends on the application's requirements.
* Applications that require high consistency should use higher isolation levels, while applications that prioritize concurrency can use lower isolation levels.
* Most database systems allow you to configure the isolation level at the session or transaction level.