Here’s your **in-depth notes** on the DBMS topic **“RDBMS Performance & Architecture”** from the given lecture transcript, fully covering each and every detail mentioned.

---

## **RDBMS Performance & Architecture – In-Depth Notes**

### **Learning Outcomes**

By the end of this topic, you should be able to:

1. **Evaluate** RDBMS performance and scalability as a backbone for data-intensive application development.
2. **Understand** the role of system architecture and database architecture in performance.
3. **Identify** options for scaling databases to larger sizes without sacrificing performance.

---

## 1. **Overview**

* In earlier modules, we discussed **query optimization**:

  * Every relational expression can have multiple **equivalent expressions** through *equivalence rule transformations*.
  * Final execution plan is chosen based on **least estimated cost** from alternatives → **optimized query**.
* This module evaluates **RDBMS** in terms of:

  * **Performance**
  * **Scalability**
  * Role of **system architecture** & **database architecture**.
* Also explores **options for scaling** databases to larger volumes of data.

---

## 2. **Performance in RDBMS**

### **2.1 Primary Requirement – Correctness**

* Any transaction must change only allowed data, in allowed ways.
* Guaranteed by **ACID properties**.

### **2.2 Performance Factors**

Three key metrics:

1. **Throughput (TPS)**

   * Transactions per second the system can execute.
   * **Goal:** High TPS.
2. **Response Time**

   * Delay from transaction submission to result return.
   * **Goal:** Low response time.
3. **Availability**

   * Measured by *Mean Time to Failure (MTTF)*.
   * **Goal:** High availability.

---

### **2.3 Factors Influencing Performance**

#### **Transaction Level**

* **Concurrency control**
* **Query optimization**
* **RAID usage** and similar techniques.

#### **System Level**

* **System Architecture:**

  * Machines, memory, disk, connectivity, etc.
* **Database Architecture:**

  * How DB processes are executed over the system architecture.

---

### **2.4 Performance Tuning**

* Identify & remove **bottlenecks**:

  * **Hardware tuning:**

    * Faster disks, more disks for parallel I/O.
    * More memory → higher buffer hit rate.
    * Faster CPU.
  * **Database tuning:**

    * Adjust **buffer size** to reduce paging.
    * Proper **checkpointing** to limit log size & recovery time.
    * **Schema modifications** (indexes, table design, transaction restructuring).

---

## 3. **Scalability in RDBMS**

### **3.1 Definition**

* Ability to handle **increasing data volumes** *without sacrificing performance*.
* Performance drop (lower TPS or higher response time) means poor scalability.

---

### **3.2 Scaling Factors**

* **Volume of Data**: Database size increases.
* **Number of Users**: Sharp increase in concurrent connections.
* **Service Diversity**: New applications, transactions.
* **Geographic Spread**: Local, national, or global usage.

---

### **3.3 Achieving Scalability**

* **Tune system architecture** (better hardware, network).
* **Tune database architecture** (process distribution, caching).
* **Beyond tuning**:

  * Adjust expectations (e.g., allow slightly reduced consistency for better performance).
  * Use **alternative data models** or **hybrid systems**.

---

## 4. **RDBMS Architectures**

### **4.1 Centralized Systems**

* Single computer, no network dependency.
* Used for:

  * Small applications.
  * Desktop/single-user DBs.
* Limitations: Not scalable for large, multi-user workloads.

---

### **4.2 Client–Server Systems**

* **First step in scalability** over centralized.
* Network separates **clients** (send requests) and **servers** (process requests).
* Can add more clients/servers for scaling.

#### **Functional Division**

* **Front-end** (client side):

  * User interface, query forms, report generation, analytics.
* **Back-end** (server side):

  * SQL engine, transaction management.

---

#### **Server Types**

1. **Transactional Query Server**:

   * Handles SQL requests via **ODBC/JDBC** over RPC.
   * Executes transactions and returns results.

2. **Data Server**:

   * Directly accesses/manages data.
   * Used for **compute-intensive tasks** (e.g., object-oriented DB extensions).
   * Needs high-speed LAN for performance.

---

#### **Typical Client–Server Architecture**

* **Clients** → send requests via API (ODBC/JDBC).
* **Transaction Server**:

  * Process monitoring.
  * Lock manager.
  * Checkpointing.
  * Log writing.
  * Database writing.
  * Shared buffers (buffer pool, log buffer, query plan cache).
* **Database Server**:

  * Manages storage on single or multiple disks.
  * Handles redundancy & partitioning.

---

### **4.3 Parallel Database Systems**

* Multiple processors & disks connected by **fast interconnection network** (tightly coupled).
* **Types:**

  * **Coarse-grained**: Few powerful processors.
  * **Massively parallel**: Thousands of smaller processors.

#### **Use Cases**

* **Speedup**: Solve fixed-size problem faster by using larger system.

  * Ideal max speedup = **n** (with n× larger system).
* **Scaleup**: Solve proportionally larger problem in same time by using proportionally larger system.

  * Ideal max scaleup = **1** (perfect scaling).

---

#### **Practical Issues with Parallelism**

* Startup cost for multiple processes.
* Resource interference (shared buses, disks, locks).
* **Skew** in processing times – slowest task limits overall speed.
* Communication overhead.

---

#### **Interconnection Topologies**

1. **Bus** – simple but poor scalability.
2. **Mesh** – √n × √n connections, better but longer transfer paths.
3. **Toroid** – mesh with wraparound links, shorter diameter.
4. **Hypercube** – log n maximum hops, fast but expensive.

---

#### **Parallel Architectures**

* **Shared Memory** – all processors share same memory.
* **Shared Disk** – each has own memory but shares disk.
* **Shared Nothing** – independent nodes, no hardware sharing.
* **Hybrid** – combination.

---

### **4.4 Distributed Database Systems**

* Data spread over multiple nodes/sites.
* **Homogeneous**: Same software, schema, hardware across sites.
* **Heterogeneous**: Different software/schema/hardware.
* Transactions can be:

  * **Local** – processed at one site.
  * **Global** – span multiple sites.

#### **Advantages**

* Data sharing across sites.
* Autonomy for local nodes.
* Higher availability via replication.

#### **Disadvantages**

* Higher development complexity.
* More bugs due to complexity.
* Increased processing overhead.

---

## 5. **Scaling Approaches**

### **5.1 Vertical Scaling (“Scale Up”)**

* Add resources to a single system (more CPU, RAM, faster storage).
* **Advantages**:

  * Cost-effective for small to medium growth.
  * Less communication overhead.
  * Easier maintenance.
* **Disadvantages**:

  * Single point of failure.
  * Limited by hardware upgrade limits.

---

### **5.2 Horizontal Scaling (“Scale Out”)**

* Add more independent systems and distribute load.
* **Advantages**:

  * Easier to expand incrementally.
  * Higher fault tolerance.
  * Can handle very large scale.
* **Disadvantages**:

  * More complex to maintain.
  * Higher initial infrastructure cost.

---

### **5.3 Popular Horizontal Scaling Patterns**

1. **Master–Slave Replication**

   * One master node handles writes, multiple slaves handle reads.
2. **Sharding**

   * Data split across multiple servers based on partitioning key.
3. **Mixed Models**

   * Combination of replication & sharding.

---

## **6. Summary**

* **Performance** in RDBMS is determined by TPS, response time, and availability.
* **System architecture** (hardware, network) and **database architecture** (process arrangement, tuning) directly influence performance.
* **Scalability** must consider data volume, users, services, and geographic spread.
* **Architectural choices** range from centralized → client–server → parallel → distributed systems.
* **Scaling options**:

  * **Vertical**: Enhance existing hardware.
  * **Horizontal**: Add more machines (sharding, replication).
* Real-world scalability often limited by startup cost, resource contention, and load imbalance.