Here are **in-depth notes** for **DBMS Lecture L9.1: Indexing and Hashing/1: Indexing/1**, covering **each and every topic** mentioned in the lecture transcript.

---

# 📘 DBMS – L9.1: Indexing and Hashing/1: Indexing/1

## 🎯 **Learning Outcomes**

* Understand the reasons for which we need to index database tables.
* Learn about ordered indexes and the Indexed Sequential Access Mechanism (ISAM).

---

## 🧠 Why Do We Need Indexing?

### ✅ Problem:

* A database table (e.g., faculty with `name` and `phone_number`) is stored as a file of records.
* Searching for a record by `name` or `phone_number` requires a **linear scan (O(n))**, unless sorted.
* Sorting on one attribute means efficient search only for that one (via **binary search** O(log n)), but not the other.

### ✅ Limitation:

* Can't sort on multiple fields simultaneously.
* Sorting complicates `insert` and `delete` operations.

### ✅ Solution:

* Use **indexing** to:

  * Allow **fast search (O(log n))** even when data is **unsorted**.
  * Create **separate index structures** for different fields (e.g., one for name, one for phone number).
  * Enable efficient **multi-attribute search** without disturbing physical data layout.

---

## 📚 What is Indexing?

### 📌 Definition:

Indexing is a technique to improve the speed of data retrieval operations on a database table by using **auxiliary lookup structures**.

### 📌 Index File:

* A **separate table** storing selected field values and their corresponding **record pointers**.
* Allows quick binary search using **sorted keys**, independent of main data file order.

---

## 🧱 Indexed Sequential Access Mechanism (ISAM)

### 🔹 Core Idea:

* Keep **main data file unordered**.
* Maintain one or more **sorted index files** with key + pointer to actual record.

### 🔹 Example:

* Main file: faculty(name, phone)
* Index file 1: sorted by `name`, points to record number in main file
* Index file 2: sorted by `phone_number`, same structure

### 🔹 Benefits:

* Can search any field (name/phone) in O(log n) if indexed.
* Doesn’t change the original data layout.
* Can create **multiple indexes** on different fields.

---

## 🏷️ Types of Indexes

### 1. **Primary Index (Clustering Index)**

* Applied when **main data file is sorted** based on the search key.
* Search key is often the **primary key**, but not necessarily.
* One entry per **data block** or **record**.

### 2. **Secondary Index (Non-clustering Index)**

* Applied when data file is **not sorted** based on the search key.
* Supports **non-primary key fields** (e.g., `salary`, `department_name`).
* Always **dense**, since no sorting exists in the main file.

---

## 🧠 Dense vs Sparse Indexing

### ✅ Dense Index:

* Contains **every search key value** with its corresponding pointer.
* E.g., all names or all phone numbers.
* Pros: Fast access.
* Cons: Large size, higher maintenance cost.

### ✅ Sparse Index:

* Contains only **some search key values**, e.g., **1 out of every 5** entries.
* Usually used on **sorted (primary) index**.
* Trade-off between space and speed.
* Needs **extra linear scan** within block if the key is not found in index.
* Typically stores pointer to **first record of each block**.

---

## 📦 Block-Level Indexing

* Index on the **first record of each block**.
* Enables fast identification of **likely block** containing the key.
* Once block is found, linear or other search is done inside the block.

---

## 📚 Secondary Index with Duplicate Values

### ✅ Challenge:

* When multiple records have **same search key** (e.g., salary = 80000).
* Index must store **pointers to all such records**.

### ✅ Solution:

* Use **one entry per search key value**, pointing to a **list** of record addresses.
* Helps avoid repetition in index file.
* Example: `80000 → [record_3, record_7]`

---

## 📐 Design Considerations and Trade-offs

| Criteria                 | Indexing Impact                            |
| ------------------------ | ------------------------------------------ |
| **Access Speed**         | Improved with indexing                     |
| **Insert/Delete/Update** | More expensive due to index maintenance    |
| **Storage Overhead**     | Higher (especially for dense indexes)      |
| **Query Optimization**   | Requires careful selection of index fields |

* Use **indexing wisely** after analyzing **access patterns** and **query frequency**.

---

## 🏗️ Multi-Level Indexing

### Problem:

* Index file itself becomes too large to fit into memory.

### Solution:

* **Index on the index** (recursive idea).

### Structure:

```
Outer Index (Sparse)
  ↓
Inner Index (Dense)
  ↓
Data Blocks (Main file)
```

* Multi-level indexing maintains **scalability** and **performance**.
* Access time = O(log n) for outer + inner index search + block scan.

---

## 🔄 Index Update Strategies

### 🔹 Deletion:

**Dense Index:**

* Delete the index entry along with data record.

**Sparse Index:**

* If deleted record is an index entry:

  * Replace it with the **next valid record** in sequence.
* If it’s not an index entry:

  * No change in index needed.

### 🔹 Insertion:

**Dense Index:**

* Insert new record + update index with key.

**Sparse Index:**

* If the new record belongs to an **existing block range**, no change needed.
* If a **new block** is created (e.g., overflow), update sparse index with first record of the new block.

### 🔹 Secondary Index:

* Index entry must be created **for every new record**.
* Must update the **list of pointers** for keys with multiple values.

---

## 📊 Performance Metrics for Indexing

1. **Access time** – How fast can we retrieve data?
2. **Insertion time** – Time to add new records and update index.
3. **Deletion time** – Time to remove records and update index.
4. **Storage overhead** – Space required for maintaining index files.

---

## 📈 Query Planning and Statistics

* Indexing strategy should be based on **expected queries**.
* Database collects **statistics** (via system catalog) over time:

  * Frequency of different queries.
  * Popular search fields.
* Use these statistics to periodically **redesign indexing strategy**.

---

## ✅ Summary

| Concept                    | Key Points                                                |
| -------------------------- | --------------------------------------------------------- |
| **Indexing**               | Enables fast search without sorting the data file         |
| **Ordered Index**          | Sorted index file (used in ISAM)                          |
| **Primary Index**          | Based on data file’s physical order                       |
| **Secondary Index**        | Independent of data file’s order, supports non-key fields |
| **Dense Index**            | Every record has an entry                                 |
| **Sparse Index**           | Only some entries stored, usually per block               |
| **Multi-Level Index**      | Index on index, improves scalability                      |
| **Performance Trade-offs** | Space vs Speed vs Maintenance                             |
| **Query-driven Design**    | Use query patterns + statistics to plan indexes           |