Here’s your **in-depth, structured notes** for **DBMS – Backup & Recovery/1: Backup/1** based on the transcript, covering **every single detail** while aligning with your learning outcomes.

---

## **DBMS – Backup & Recovery/1: Backup/1**

### **Learning Outcomes**

1. **Understand the need for backup** – why databases require backups and situations demanding them.
2. **Explore various strategies of backup** – full, incremental, and differential backups, their advantages/disadvantages, and selection based on business needs.

---

## **1. Introduction & Context**

* Previous modules covered **transactions** and their **ACID properties**:

  * **Atomicity**, **Consistency**, **Isolation**, **Durability**
* Covered **concurrent transactions** and **serializability issues**.

  * **Conflict serializability**: ensured via acyclic precedence graph (polynomial time)
  * **View serializability**: NP-complete problem; requires ad-hoc methods
* Recovery from failures:

  * **Cascaded** & **cascadeless rollback**
* Locking for isolation → throughput improvement but deadlocks possible → detection/prevention/recovery strategies.
* Now moving to **Backup & Recovery**:
  Ensures database consistency after **failures beyond control**.

---

## **2. Need for Backup**

Backup: **Representative copy** of the database containing all necessary data (tables, control files, logs, etc.) to restore in case of failure.

### **Reasons for Backup**

1. **Disaster Recovery**

   * Natural disasters, hardware failure, corruption.
   * Enables restoring to the latest consistent state.
2. **Business Process Changes**

   * Developers may need older database versions for testing or rollback after changes.
3. **Auditing & Compliance**

   * Financial fraud investigations, legal disputes.
   * Need historical snapshots for evidence.
4. **Minimizing Downtime**

   * Without backups, recovery takes much longer.
   * Downtime → severe business impact.

---

## **3. Types of Data to Backup**

1. **Business Data**

   * Core database content: client info, employee records, sales, course details, rules, etc.
2. **System Data**

   * Database environment/configurations: system catalogs, log files, software dependencies, disk images.
3. **Media Data**

   * Large binary objects (BLOBs): images, videos, audio, graphics.
   * Usually much larger in size than textual database data.

---

## **4. Backup Strategies**

Backup strategies answer:

* **How much** to backup
* **How often** to backup
* **What** to backup

---

### **4.1 Full Backup**

**Definition:** Complete replica of the database at a given point in time (tables, procedures, views, system files, etc.)

**When to Take Full Backup:**

* Before any major database operation
* Before switching to other backup methods
* Frequency depends on:

  * Application type (e.g., 24/7 banking systems may avoid frequent full backups due to downtime)
  * Storage capacity & admin skill availability

**Advantages:**

* Simple setup and maintenance
* Easy recovery (single backup set needed)
* Independent backups (loss of one full backup doesn’t affect others)

**Disadvantages:**

* High storage requirement
* Long downtime during backup
* Not always feasible for very large or constantly updated systems

---

### **4.2 Incremental Backup**

**Definition:** Backs up **only** the data that changed since the **last backup** (full or incremental).

**Example:**

* Friday: Full backup
* Saturday: Incremental (changes since Friday)
* Sunday: Incremental (changes since Saturday)
* Monday: Incremental (changes since Sunday), etc.

**Advantages:**

* Minimal storage usage
* Short backup time
* Cost-efficient

**Disadvantages:**

* Recovery requires:

  * The last full backup **plus** all subsequent incremental backups
* Failure/loss of one incremental backup → incomplete recovery
* Slower recovery compared to full backups

---

### **4.3 Differential Backup**

**Definition:** Backs up all changes made **since the last full backup**, regardless of intermediate incrementals.

**Example Strategy:**

* Friday: Full backup
* Saturday: Incremental
* Sunday: Incremental
* Monday: Incremental
* Tuesday: Differential (since Friday)
* Wednesday: Incremental (since Tuesday differential)
* Thursday: Incremental, then Friday: Full backup

**Advantages:**

* Fewer backup sets needed for recovery:

  * Full backup + latest differential + recent incrementals
* Faster recovery than pure incremental approach
* Suitable when full backups are infrequent (e.g., monthly)

**Disadvantages:**

* Larger storage than incremental backups
* Size of differential may grow close to a full backup if changes are large

---

## **5. Choosing a Backup Strategy**

* **Granularity trade-off:**

  * **Full Backup** → Lowest granularity, simplest recovery, high cost
  * **Incremental Backup** → Highest granularity, low cost, complex recovery
  * **Differential Backup** → Middle ground, balance between cost & recovery complexity
* Business requirements, storage limits, and acceptable downtime influence the choice.

---

## **6. Case Study: Monthly Backup Schedule**

**Assumption:**

* Heavier backups on Sundays (low business activity)

**Example Pattern:**

1. **First Sunday** → Full backup
2. **Subsequent Sundays** → Differential backup
3. **Weekdays** → Incremental backups

**Recovery Requirement:**

* 1 full backup + 1 differential backup + ≤6 incremental backups
* Compared to **full + daily incrementals** (31 backup sets in a month), this reduces recovery effort and risk.

---

## **7. Cold Backup vs. Hot Backup**

### **Cold Backup**

* Taken when the database is offline or in minimal use.
* All earlier strategies (full/incremental/differential) generally describe cold backups.

### **Hot Backup**

* Taken while the database is online and accessible to users.
* Essential for:

  * Real-time, 24/7 systems (e.g., stock markets, online banking)
  * Systems with highly dynamic data
* **Advantages:**

  * Continuous availability
  * Easier point-in-time recovery
* **Disadvantages:**

  * Complex setup
  * Less fault-tolerant; an error can terminate backup
  * High maintenance and cost
  * Difficult for huge, monolithic datasets

---

## **8. Transaction Log Backup (for Hot Backup)**

* **Transaction Logging:**
  Records each database operation (read, write, commit, abort) sequentially.
* Transaction logs are **much smaller** than the entire database.
* Used for **point-in-time recovery**:

  * Combine a backup (full/differential) with logs to restore up to a specific moment.
* Old logs can be discarded once synchronized with the main database.

---

## **9. Summary Table of Backup Strategies**

| **Type**         | **What It Backs Up**                            | **Advantages**                               | **Disadvantages**                          | **Best Use Case**                  |
| ---------------- | ----------------------------------------------- | -------------------------------------------- | ------------------------------------------ | ---------------------------------- |
| **Full**         | Entire database                                 | Simple recovery, independent backups         | High storage & downtime                    | Small/medium DBs, weekly backups   |
| **Incremental**  | Changes since last backup (full or incremental) | Low storage, fast backup                     | Complex/slower recovery, backup chain risk | Daily backups with weekly full     |
| **Differential** | Changes since last full backup                  | Faster recovery than incremental, fewer sets | Larger than incremental, grows with time   | Large DBs, infrequent full backups |