**Subject: Database Management Systems (DBMS)**

**Module: File Structure and Storage (Module 40)**

---

### **1. Organization of Database Files**

#### **Hierarchical Containment Structure**

* **Database** → Collection of **Files**
* **File** → Sequence of **Records**
* **Record** → Sequence of **Fields**

#### **Assumptions for Simplicity (Initial Model)**

* Fixed-size records
* Records of only one type per file
* One file per relation

#### **Block-based Storage**

* Disk storage is partitioned into **fixed-length blocks**
* Records must not cross block boundaries
* Blocks are the basic unit for:

  * Data allocation
  * Data transfer

---

### **2. Record Management Techniques**

#### **Fixed-Length Records**

* Stored linearly like arrays
* Access via offsets: `i-th record = base_address + i * record_size`
* Records must not span multiple blocks

#### **Deletion Strategies**

1. **Compaction**

   * Move all records after the deleted one up
   * Costly and inefficient

2. **Swap with Last Record**

   * Replace the deleted record with the last one
   * Simpler but changes order

3. **Free List** *(Most Common)*

   * Maintain deleted record addresses in a chain
   * Stored in the file header and reused for insertion

#### **Variable-Length Records**

* Fields may include types like `VARCHAR` (varying size)
* Structure:

  * Fixed-length section (offsets to variable data)
  * Variable-length data appended after fixed section
  * Null values managed via **null bitmap**

---

### **3. Slotted Page Structure**

* Used to manage multiple variable-size records in a block
* **Page Header** includes:

  * Number of record entries
  * End of free space
  * Offsets and sizes of each record
* Allows internal rearrangement for efficiency
* External references remain unchanged due to indirection via header

---

### **4. File Organization Methods**

#### **Heap File**

* No ordering
* Records inserted wherever space is available

#### **Sequential File**

* Ordered by a **search key**
* Efficient for binary search and range queries
* May need **overflow blocks** for insertions
* May require **reorganization** periodically

#### **Multi-Table Clustering**

* Stores records from different relations together based on relationships
* Good for queries involving joins or many-to-one relationships
* **Example:** Grouping `instructor` records under each `department`
* Drawback: Not optimal for queries targeting only one relation
* Results in **variable-length** records and potential data type handling issues

---

### **5. Data Dictionary (System Catalog)**

* Stores **metadata** (data about data)
* Essential for:

  * Relation definitions
  * Attribute types and positions
  * Indexes
  * Users, roles, and permissions
* Should reside in **main memory** for fast access

#### **Typical Metadata Tables**

1. **Relation Metadata**

   * Relation name, number of attributes, storage details

2. **Attribute Metadata**

   * Attribute name, domain type, position, length

3. **Index Metadata**

   * Index name, relation name, indexed attributes

4. **View Metadata**, **User Metadata**, **Permission Metadata**

---

### **6. Access Mechanism and Buffer Management**

#### **Memory Hierarchy**

* Disk (slow, large) → Buffer (RAM, fast, small) → CPU
* **Block Transfer** from disk to buffer for operations

#### **Buffer Management Workflow**

1. Program requests a block
2. Buffer manager checks if block exists in buffer
3. If not:

   * Space is created by removing an old block
   * Block is read from disk to buffer
   * Modified blocks are written back (if necessary)

#### **Replacement Strategies**

* **LRU (Least Recently Used)**

  * Replace block not used for longest time
  * Works well generally, but has limitations (e.g., nested loop joins)

#### **Buffer Control Techniques**

* **Pinned Blocks**: Cannot be replaced until unpinned
* **Toss Immediate**: Reclaim block space right after use
* **MRU (Most Recently Used)**: Used after processing block
* **Forced Output**: Used for recovery operations

---

### **7. Summary / Learning Outcomes**

* Understanding of file structures: heap, sequential, and clustered
* Managing records: fixed vs variable length, deletion, and compaction
* Slotted page structure for internal record management
* Metadata management via data dictionary
* Efficient buffer management to optimize disk access

---

### **Next Module Preview**

* **Indexing Mechanisms**

  * Will build upon file structure and buffer considerations
  * Focus on fast data retrieval using index structures

---

End of Notes for Module 40: File Structure and Storage.