<a href="https://colab.research.google.com/github/AdarshKhatri01/DBMS-Notes/blob/main/RECOVERY.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### **Recovery Concepts and Techniques in DBMS**  

#### **1. Introduction to Recovery in DBMS**  
Database recovery refers to the process of restoring the database to a consistent state after a failure, such as system crashes, disk failures, or software bugs. The recovery process ensures data integrity and prevents data loss.

#### **2. Types of Failures in DBMS**
- **Transaction Failure:** Occurs due to logical errors (e.g., divide by zero) or system crashes.
- **System Crash:** Sudden failure due to hardware or software issues.
- **Disk Failure:** Physical damage to the storage device leading to data loss.
- **Deadlock Termination:** If a deadlock is detected, one or more transactions may be terminated.


#### **3. Why is Recovery Needed?**
- To ensure Atomicity (All-or-Nothing property of transactions).
- To recover from system crashes and hardware failures.
- To maintain data consistency and prevent data corruption.
- To ensure durability (once a transaction is committed, it remains permanent).

#### **4. Recovery Techniques in DBMS**
There are several recovery techniques used to restore a database after a failure:

##### **A. Log-Based Recovery**  
A log is a record of all the changes made to the database. The log file helps in recovering lost data.  
- **Write-Ahead Logging (WAL):** The log must be written to disk before actual database modifications.
- **Undo (Rollback):** Reverts changes of uncommitted transactions.
- **Redo (Rollforward):** Reapplies changes of committed transactions.

##### **B. Checkpointing**  
- A checkpoint is a point in time where all changes are saved to disk.
- If a failure occurs, recovery starts from the last checkpoint instead of scanning the entire log.

##### **C. Shadow Paging**  
- Maintains two copies of data: one active and one shadow copy.
- The shadow copy remains unchanged until the transaction is committed.
- Provides fast recovery but increases storage requirements.

##### **D. ARIES (Algorithm for Recovery and Isolation Exploiting Semantics)**  
ARIES is a widely used recovery technique involving:  
1. **Analysis Phase:** Identifies transactions that need to be undone or redone.  
2. **Redo Phase:** Redoes committed transactions.  
3. **Undo Phase:** Rolls back uncommitted transactions.  

##### **E. Deferred Update**  
- Changes are recorded in the log but not applied to the database until the transaction commits.
- If a failure occurs before commit, changes are ignored.

##### **F. Immediate Update**  
- Changes are applied to the database immediately but stored in a log.
- If a failure occurs, committed changes are redone, and uncommitted changes are undone.

#### **4. Conclusion**  
Database recovery techniques ensure that data remains consistent and available despite failures. Different techniques are used based on system requirements, ensuring data integrity and durability.

# **Log-Based Recovery in DBMS**  

## **1. Introduction to Log-Based Recovery**  
Log-based recovery is a technique used in **Database Management Systems (DBMS)** to ensure that a database can be recovered to a consistent state after a failure. This method relies on a **transaction log**, which records all changes made to the database before they are actually applied.

### **Why is Log-Based Recovery Needed?**  
- Ensures **Atomicity** (All-or-nothing execution of transactions).
- Maintains **Durability** (Committed changes persist even after system failures).
- Helps recover from **system crashes**, **transaction failures**, and **media failures**.
- Supports **Undo (Rollback)** and **Redo (Rollforward)** operations.

---

## **2. Transaction Log (Write-Ahead Log - WAL)**
A **transaction log** is a file where the DBMS records all changes before they are written to the database. The log typically contains:

| **Field** | **Description** |
|-----------|---------------|
| **Transaction ID (TID)** | Unique ID for each transaction |
| **Transaction Type** | Insert, Update, Delete, etc. |
| **Object/Record ID** | The affected database record |
| **Before-Image** | The value before modification (used for Undo) |
| **After-Image** | The new value after modification (used for Redo) |
| **Commit/Abort Status** | Indicates if the transaction was committed or aborted |

---

## **3. Write-Ahead Logging (WAL) Protocol**
The **WAL protocol** ensures that:
1. **Before making any changes to the database, the log is written to stable storage (disk)**.
2. If a failure occurs, the log can be used to restore the database using **Undo (Rollback)** or **Redo (Rollforward)** operations.

**Key Rule of WAL:**  
‚û°Ô∏è **"Log first, update later."**  
- Changes must be written to the log **before** they are applied to the database.

---

## **4. Log-Based Recovery Techniques**
There are two primary operations in log-based recovery:

### **A. Undo (Rollback) Operation**  
- Used when a transaction **fails before commit**.
- The system uses the **before-image** stored in the log to revert changes.

#### **Example of Undo Operation**  
- A transaction **T1** updates a salary from **‚Çπ50,000 to ‚Çπ60,000**.
- Before making the change, the log stores:  
  ```
  [T1, UPDATE, Employee_ID=101, BEFORE: ‚Çπ50,000, AFTER: ‚Çπ60,000]
  ```
- If T1 **fails before commit**, the database restores ‚Çπ50,000 using the **before-image**.

---

### **B. Redo (Rollforward) Operation**  
- Used when a transaction **commits but changes were not saved to the database due to failure**.
- The system uses the **after-image** stored in the log to reapply committed changes.

#### **Example of Redo Operation**  
- A transaction **T2** updates the stock quantity of a product from **100 to 90**.
- The system logs:  
  ```
  [T2, UPDATE, Product_ID=200, BEFORE: 100, AFTER: 90]
  ```
- If T2 **commits** but a system crash occurs before changes are saved, the database **reapplies** 90 using the **after-image**.

---

## **5. Types of Log-Based Recovery Approaches**

### **A. Deferred Update (No Undo, Only Redo)**
- Changes are **not written** to the database until the transaction commits.
- Only **Redo is needed** (since no uncommitted changes exist).
- Uses **Transaction Log** to apply committed updates.

**Process:**
1. Log the changes.
2. Apply changes to the database **only after commit**.
3. If failure occurs before commit, simply discard the log (Rollback not needed).

**Example:**
- Transaction updates balance **‚Çπ10,000 ‚Üí ‚Çπ12,000**.
- The change is **logged** but not written to the database.
- If the system crashes before commit ‚Üí No recovery needed.
- If the system crashes after commit ‚Üí Apply the **Redo log**.

### **B. Immediate Update (Undo & Redo Required)**
- Changes are written **immediately** to the database before commit.
- **Both Undo & Redo** operations are needed.

**Process:**
1. Log the **before-image & after-image**.
2. Apply updates **immediately** to the database.
3. If failure occurs **before commit** ‚Üí Undo using the **before-image**.
4. If failure occurs **after commit** ‚Üí Redo using the **after-image**.

**Example:**
- Transaction updates balance **‚Çπ10,000 ‚Üí ‚Çπ12,000**.
- The update is **immediately** applied to the database.
- If the system crashes before commit ‚Üí Rollback to ‚Çπ10,000 using Undo.
- If the system crashes after commit ‚Üí Ensure ‚Çπ12,000 using Redo.

---

## **6. Checkpointing in Log-Based Recovery**
- A **checkpoint** is a point where all committed transactions are written to the database, and logs before this point can be discarded.
- Reduces recovery time after a failure.

### **Checkpointing Process**
1. The database writes all committed transactions to disk.
2. A **"Checkpoint"** entry is added to the log.
3. During recovery, only logs **after the checkpoint** are processed.

**Example of Log Entries with Checkpoint**
```
[T1, START]
[T1, UPDATE, Balance, BEFORE: 5000, AFTER: 7000]
[T1, COMMIT]
---CHECKPOINT---
[T2, START]
[T2, UPDATE, Salary, BEFORE: 40000, AFTER: 45000]
```
- If failure occurs, logs before the **CHECKPOINT** are ignored.
- Only **T2‚Äôs log** is processed for recovery.

---

## **7. ARIES (Advanced Recovery Algorithm)**
- **ARIES (Algorithm for Recovery and Isolation Exploiting Semantics)** is a popular log-based recovery method.
- It follows **Write-Ahead Logging (WAL)** and **Three-Phase Recovery**:
  1. **Analysis Phase** ‚Äì Identify transactions active at failure time.
  2. **Redo Phase** ‚Äì Reapply committed transactions.
  3. **Undo Phase** ‚Äì Rollback uncommitted transactions.

---

## **8. Comparison of Log-Based Recovery Approaches**
| **Feature** | **Deferred Update** | **Immediate Update** |
|------------|------------------|------------------|
| **Changes Applied** | After Commit | Before Commit |
| **Undo Required?** | No | Yes |
| **Redo Required?** | Yes | Yes |
| **Complexity** | Simple | Moderate |
| **Best Used For** | Low-failure environments | High-performance systems |

---

## **9. Advantages & Disadvantages of Log-Based Recovery**
### **Advantages**
‚úî **Ensures Atomicity & Durability** ‚Äì Transactions remain consistent.  
‚úî **Fast Recovery** ‚Äì Reduces downtime after failures.  
‚úî **Efficient Storage** ‚Äì Only logs need to be stored, not entire backups.  

### **Disadvantages**
‚ùå **Log Overhead** ‚Äì Logging increases storage and processing overhead.  
‚ùå **Performance Impact** ‚Äì Writing logs before updates can slow performance.  
‚ùå **Complexity** ‚Äì Managing logs, checkpoints, and recovery phases requires careful implementation.  

---

## **10. Conclusion**
Log-based recovery is essential for maintaining **data integrity and durability** in DBMS. It provides mechanisms to **undo uncommitted changes** and **redo committed updates** after failures. The **Write-Ahead Logging (WAL) protocol** ensures that logs are saved before actual database updates, making recovery reliable.  

üöÄ **Key Takeaways:**  
‚úî **Deferred Update** is simple and only requires **Redo**.  
‚úî **Immediate Update** supports both **Undo & Redo**.  
‚úî **Checkpointing** speeds up recovery.  
‚úî **ARIES algorithm** is widely used for advanced recovery in modern databases.  

Would you like me to provide SQL-based examples or a step-by-step implementation of log-based recovery in a real database system? üòä

# **Shadow Paging in DBMS: Concepts & Techniques**  

## **1. Introduction to Shadow Paging**  
Shadow Paging is a **database recovery technique** that ensures atomicity and durability by maintaining two versions of database pages:  
1. **Shadow Pages (Old Version)** ‚Üí Stores the last committed state.  
2. **Current Pages (New Version)** ‚Üí Stores changes made by ongoing transactions.  

Unlike log-based recovery, **shadow paging does not require UNDO/REDO logs**. Instead, it ensures consistency by replacing page tables upon **transaction commit** or **discarding changes** upon failure.

---

## **2. How Shadow Paging Works?**
Shadow Paging is based on **two key concepts**:  
‚úÖ **Page Table** ‚Üí Maps logical pages to physical pages.  
‚úÖ **Shadow Copy Mechanism** ‚Üí Preserves old data while allowing updates.

### **Steps in Shadow Paging:**
1. **Initial Setup**:
   - A **Page Table** maps logical pages to physical pages.
   - A **Shadow Page Table** (copy of the original) is maintained.

2. **Transaction Execution**:
   - A transaction updates **current pages**, but the **shadow pages remain unchanged**.
   - Updated pages are stored in **new locations**.

3. **Transaction Commit**:
   - The database **switches** to the new Page Table.
   - The **Shadow Page Table is discarded**.

4. **Transaction Rollback**:
   - The system **discards the updated Page Table**.
   - The database continues using the **Shadow Page Table**.

---

## **3. Example of Shadow Paging**
Consider a database with three pages **P1, P2, P3** stored at locations:  

| Logical Page | Physical Page (Before Update) | Physical Page (After Update) |
|-------------|-----------------------------|-----------------------------|
| P1 | Address 101 | Address 201 |
| P2 | Address 102 | Address 102 |
| P3 | Address 103 | Address 203 |

### **Transaction Update Process**
1. **Before Update:**  
   - **Page Table** ‚Üí Points to P1 (101), P2 (102), P3 (103).  
   - **Shadow Page Table** ‚Üí Same as Page Table.

2. **During Update:**  
   - P1 changes ‚Üí New copy at **Address 201**.
   - P3 changes ‚Üí New copy at **Address 203**.
   - P2 remains unchanged.

3. **Commit:**
   - The database **switches** to the updated Page Table.
   - The old **Shadow Page Table is discarded**.

---

## **4. Techniques Based on Shadow Paging**
### **A. Basic Shadow Paging**
- The simplest implementation where **every transaction gets a shadow page table**.
- Upon commit, the system **switches** to the updated pages.

**Advantages**:
‚úÖ No need for log-based UNDO/REDO.  
‚úÖ Faster than log-based recovery.  

**Disadvantages**:
‚ùå **High Storage Overhead** ‚Äì Every update requires additional space.  
‚ùå **Not Scalable** ‚Äì Large databases require frequent page copying.  

---

### **B. Multi-Version Shadow Paging**
- Instead of maintaining just one **shadow page**, multiple versions are kept.
- Transactions can **read older versions** while updates are made.

**Example**:
- **T1 reads version-1 of a page**, while **T2 updates version-2**.
- At commit, **version-2 replaces version-1**.

‚úÖ Supports **Concurrency**.  
‚úÖ Allows **Read-Only Transactions** to access older versions.  

‚ùå **More Storage Required** ‚Äì Multiple versions increase disk usage.  

---

### **C. Hybrid Shadow Paging with Logging**
- Combines **shadow paging** with **Write-Ahead Logging (WAL)**.
- Logs are maintained **for rollback and faster recovery**.

**Steps**:
1. Shadow pages store the last committed state.
2. A transaction logs all updates before modifying pages.
3. On **commit**, updated pages replace shadow pages.
4. On **failure**, logs help reconstruct uncommitted changes.

‚úÖ Faster Recovery than **pure shadow paging**.  
‚úÖ Less Storage Overhead than **multi-version shadow paging**.  

‚ùå Slightly **complex** due to logging integration.  

---

## **5. Comparison with Log-Based Recovery**
| **Feature** | **Shadow Paging** | **Log-Based Recovery** |
|------------|----------------|----------------|
| **UNDO Required?** | ‚ùå No | ‚úÖ Yes |
| **REDO Required?** | ‚ùå No | ‚úÖ Yes |
| **Storage Overhead** | üìà High (Duplicate Pages) | üìâ Moderate (Log Files) |
| **Performance** | üöÄ Faster for small DBs | üèéÔ∏è Efficient for large DBs |
| **Failure Recovery** | ‚úÖ Fast (Switch Tables) | ‚è≥ Slower (Redo Logs) |

---

## **6. Advantages & Disadvantages of Shadow Paging**
### **Advantages**
‚úî **Fast Recovery** ‚Äì No log processing required.  
‚úî **Atomic Transactions** ‚Äì Changes are either fully committed or discarded.  
‚úî **No UNDO/REDO Overhead** ‚Äì Simplifies transaction recovery.  

### **Disadvantages**
‚ùå **Storage-Intensive** ‚Äì Requires duplicate page copies.  
‚ùå **Not Suitable for Large Databases** ‚Äì Frequent copying increases I/O overhead.  
‚ùå **Concurrency Issues** ‚Äì Hard to manage multiple simultaneous transactions.  

---

## **7. Conclusion**
Shadow Paging is an effective **non-log-based recovery technique** suitable for **small and medium databases**. It provides fast recovery but has **high storage costs**. Hybrid methods combining **shadow paging with logging** are used in **modern DBMS** to improve efficiency.  

Would you like a **step-by-step implementation** in SQL or Python? üòä

# **Database Recovery Techniques: Deferred Update, Immediate Update, Shadow Paging, and Multi-Database Recovery**  

## **1. Introduction to Database Recovery Techniques**  
Database recovery techniques ensure that transactions follow **ACID properties** (Atomicity, Consistency, Isolation, Durability) even in case of failures. Different techniques handle transaction failures in different ways.  

### **Types of Recovery Techniques:**
1. **Deferred Update** ‚Äì Changes are applied after transaction commit.
2. **Immediate Update** ‚Äì Changes are applied instantly but can be undone.
3. **Shadow Paging** ‚Äì Uses a shadow copy for transaction safety.
4. **Recovery in Multi-Database Transactions** ‚Äì Ensures consistency across distributed databases.  

---

## **2. Deferred Update (Deferred Write)**
### **Concept:**  
- Changes made by a transaction are **not applied** to the database until the transaction commits.
- Before commit, changes are stored in a **log file (Transaction Log)**.
- If the system fails before the commit, **no rollback is required** since changes were never applied.

### **Steps in Deferred Update:**  
1. Transaction begins.
2. All changes are recorded in a **log file** (not in the database).
3. If the transaction commits, changes are written to the database.
4. If the transaction fails, no changes are applied (rollback is unnecessary).

### **Example:**  
- A transaction updates an **account balance** from ‚Çπ50,000 to ‚Çπ60,000.
- The change is **stored in a log file** but not applied to the database.
- If the transaction commits, the log is processed, and the database is updated.
- If the transaction fails, nothing happens (since no changes were made to the database).  

### **Advantages:**  
‚úÖ Ensures **Atomicity** ‚Äì No partial updates.  
‚úÖ No need for **Undo** operations.  

### **Disadvantages:**  
‚ùå Requires **additional storage** for logs.  
‚ùå Performance overhead due to log processing.  

---

## **3. Immediate Update**
### **Concept:**  
- Changes are **immediately applied** to the database, even before the transaction commits.
- If the transaction fails, **Undo (Rollback)** operations are performed.
- The system maintains logs to track before and after values.  

### **Steps in Immediate Update:**  
1. Transaction begins.  
2. Updates are applied to the database immediately.  
3. A log records the **before-image** and **after-image** of data.  
4. If the transaction **commits**, changes remain.  
5. If the transaction **fails**, an Undo operation reverts changes using the **before-image**.  

### **Example:**  
- A transaction updates an **item price** from ‚Çπ500 to ‚Çπ600.  
- The new price ‚Çπ600 is **immediately written** to the database.  
- The log stores **before-image: ‚Çπ500** and **after-image: ‚Çπ600**.  
- If the system fails, the database restores ‚Çπ500 using the log.  

### **Advantages:**  
‚úÖ Faster transactions for frequently updated databases.  
‚úÖ Works well with **Write-Ahead Logging (WAL)**.  

### **Disadvantages:**  
‚ùå Needs **Undo** operations if failure occurs.  
‚ùå Risk of **inconsistencies** if logs are lost.  

---

## **4. Shadow Paging**
### **Concept:**  
- Instead of modifying the original database, **a shadow copy** (duplicate) of the database pages is created.  
- The **original page** remains unchanged until the transaction commits.  
- If the transaction commits, the shadow pages become active.  
- If the transaction fails, the system discards the shadow pages, ensuring no partial updates.

### **Steps in Shadow Paging:**  
1. **Transaction begins**, and a shadow copy of the database page is created.  
2. Updates are made in the shadow copy.  
3. If the transaction commits, the **shadow copy replaces the original**.  
4. If the transaction fails, the **shadow copy is discarded**, and the original remains.  

### **Example:**  
- A transaction modifies a **customer record** in a bank database.  
- Instead of modifying the actual record, a **shadow copy** is created.  
- If the transaction **commits**, the shadow copy is applied.  
- If the transaction **fails**, the shadow copy is discarded, keeping the original intact.  

### **Advantages:**  
‚úÖ No **Undo/Redo** operations required.  
‚úÖ Provides **fast recovery** after system crashes.  

### **Disadvantages:**  
‚ùå High **storage overhead** (requires two copies of data).  
‚ùå Not suitable for **large databases** due to excessive duplication.  

---

## **5. Recovery in Multi-Database Transactions (Distributed Databases)**
### **Concept:**  
- In a **distributed database system**, transactions span across multiple databases.  
- If a failure occurs, **all involved databases must be consistent**.  
- Uses **Two-Phase Commit (2PC) Protocol** to ensure consistency.

### **Two-Phase Commit (2PC) Protocol:**
**Phase 1: Prepare Phase (Voting Phase)**  
- The coordinator asks all participating databases if they can commit.  
- Each participant responds with **"Yes" (Prepared)** or **"No" (Abort)**.  

**Phase 2: Commit Phase (Execution Phase)**  
- If all participants vote "Yes," the transaction **commits**.  
- If any participant votes "No," the transaction **aborts** and all changes are rolled back.  

### **Example of Multi-Database Recovery:**  
- A banking transaction transfers money **from Bank A to Bank B**.  
- If Bank A **confirms the debit** but Bank B **fails to credit**, inconsistency occurs.  
- Using **2PC**, the transaction ensures that both banks confirm the transaction **before commit**.  

### **Advantages:**  
‚úÖ Guarantees **data consistency** in distributed systems.  
‚úÖ Ensures **atomicity** across multiple databases.  

### **Disadvantages:**  
‚ùå **High communication overhead** (requires multiple confirmations).  
‚ùå Risk of **blocking transactions** if a database fails during commit.  

---

## **6. Comparison of Recovery Techniques**
| **Technique** | **Undo Required?** | **Redo Required?** | **Storage Overhead** | **Best Used For** |
|--------------|----------------|----------------|----------------|----------------|
| **Deferred Update** | No | Yes | Moderate | Simple transactions with low failure risk |
| **Immediate Update** | Yes | Yes | Low | High-frequency updates requiring fast processing |
| **Shadow Paging** | No | No | High | High-reliability databases requiring fast recovery |
| **Multi-Database Recovery (2PC)** | Yes | Yes | High | Distributed databases requiring strong consistency |

---

## **7. Conclusion**
- **Deferred Update** is ideal for systems where failures are rare and transactions commit frequently.  
- **Immediate Update** is best for **high-performance applications**, but requires logging for rollback.  
- **Shadow Paging** is great for **fast recovery**, but is inefficient for large databases.  
- **Multi-Database Recovery (2PC)** ensures consistency in **distributed databases**, but at a higher communication cost.  

Each recovery technique has its own use cases, and in real-world systems, **a combination of techniques** is often used to achieve optimal performance and reliability. üöÄ