Alright — I’ll prepare **detailed, structured, and comprehensive notes** from your transcript on **Non-Relational DBMS: NoSQL**, covering **every single point** in the lecture and mapping them to the learning outcomes.

---

# **DBMS – Non-Relational DBMS: NoSQL**

## **Learning Outcomes**

By the end of this topic, you should be able to:

1. Understand **issues in Big Data**.
2. Understand the **approach of NoSQL** and **CAP theorem** in comparison to **ACID**.
3. Identify and describe the **common types of NoSQL databases**.

---

## **1. Issues in Big Data**

### **1.1 What is Big Data?**

* **Definition:**
  Data sets so **voluminous** and **complex** that **traditional data processing applications** are inadequate to handle them.
* **Scope:**
  Goes beyond just “large” datasets — involves:

  * Diverse data types.
  * Extremely high generation and processing speeds.
  * Inconsistencies and varying data quality.

---

### **1.2 Growth of Data (2011 Study Example)**

* **Trend:**

  * Analog storage dominated until the 1980s.
  * Digital storage began rising in the early 1990s.
  * **2002**: “Beginning of the Digital Age” — digital and analog storage balanced.
  * Post-2002: Massive **explosion in digital data** due to mobile & handheld devices.
* **Example Data Volume (2011):**

  * Analog: \~19 exabytes.
  * Digital: \~300 exabytes.
* **Units:**

  * Mega (10⁶), Giga (10⁹), Tera (10¹²), Peta (10¹⁵), Exa (10¹⁸).

---

### **1.3 Challenges of Big Data**

* **Data Capture** — massive, continuous data inflow.
* **Storage** — even distributed storage faces limits.
* **Analysis** — extracting meaningful insights at scale.
* **Visualization** — presenting complex datasets effectively.
* **Querying** — efficient retrieval across huge, diverse datasets.

**Example:** Facebook alone represents a massive big data problem.

---

### **1.4 Big Data Is Not Only “Big”**

* Enables:

  * **Predictive Analysis** — e.g., Amazon predicting future purchases.
  * **User Behavior Analysis** — targeting ads, recommendations.
  * **Real-time insights** — trend detection, fraud prevention.

---

### **1.5 Characteristics of Big Data – The 5 V’s**

1. **Volume** — sheer amount of data.
2. **Variety** — text, images, audio, video, logs, etc.
3. **Velocity** — speed of data generation and processing.
4. **Variability** — inconsistency in data sets.
5. **Veracity** — varying data quality and accuracy.

---

## **2. Introduction to NoSQL**

### **2.1 Why NoSQL?**

* **Limitations of RDBMS** for big data:

  * Rigid schema.
  * ACID guarantees can hinder scalability.
  * Difficult to horizontally scale (scale-out).
* **NoSQL**: “Not Only SQL” — alternative approaches to storage & retrieval **other than tabular relations**.

---

### **2.2 History**

* Concepts existed before relational DBs:

  * **Hierarchical Model** (tree-like, IBM, 1960s).
  * **Network Model** (graph-like, early 1970s).
* RDBMS dominance from late 1970s pushed these aside.
* **Re-emergence** due to big data & Web 2.0 needs.

---

### **2.3 NoSQL is Not “Anti-SQL”**

* Name means:

  * Not **only** SQL — supports additional paradigms.
* Popularized:

  * Coined: **Carl Strozzi (1998)**.
  * Reintroduced: **Eric Evans (21st century)**.

---

### **2.4 Advantages of NoSQL**

* Schema-free — no fixed schema required beforehand.
* Data replication to multiple sites.
* Horizontal scalability (scale-out).
* Fault tolerance — no single point of failure.
* Low-cost, easy to implement.
* High performance for key-value and write-heavy workloads.

---

### **2.5 Disadvantages of NoSQL**

* No standard declarative query language (more programming effort).
* ACID properties compromised → weaker guarantees.
* Limited or partial support for relational features.
* More complex integration with existing enterprise systems.

---

### **2.6 Industry Adoption**

* **Facebook, Twitter, Reddit** → Cassandra.
* **Google** → BigTable.
* **Amazon** → Dynamo.
* **LinkedIn** → Voldemort.

---

## **3. CAP Theorem vs. ACID**

### **3.1 CAP Theorem (Eric Brewer)**

* Properties:

  1. **Consistency (C)** — all clients see the same data.
  2. **Availability (A)** — system always responds to requests (read/write succeed).
  3. **Partition Tolerance (P)** — system continues despite network partitions.

**Rule:**
Impossible to guarantee all three simultaneously; only **two** can be fully achieved at a time.

---

### **3.2 Trade-offs**

* Traditional RDBMS: **C + A** (sacrifice P).
* Many NoSQL systems:

  * **AP** (Availability + Partition Tolerance, sacrifice strong consistency).
  * **CP** (Consistency + Partition Tolerance, sacrifice availability).

---

### **3.3 Consistency Models**

* **RDBMS:** Strong consistency (ACID).
* **NoSQL:** Weak consistency — **BASE** model:

  * **Basically Available**
  * **Soft state**
  * **Eventual consistency** — system becomes consistent after some time without new updates.

---

### **3.4 Example – Eventual Consistency & Gossip Protocol**

* Data replicated across nodes (M, N, etc.).
* When client writes to **N**, other nodes may not see it immediately.
* **Gossip Protocol**:

  * Nodes randomly share updates with other nodes over time.
  * Eventually, all nodes converge to the latest data if no further updates occur.

---

### **3.5 CAP Example Classification**

* **AP:** Cassandra, Dynamo.
* **CP:** BigTable, HyperTable, MongoDB.
* **CA:** Traditional RDBMS (focus on ACID).

---

## **4. Types of NoSQL Databases**

### **Common Properties**

* Schema-less — evolves dynamically.
* Supports unstructured & semi-structured data.
* Flexible data types — can add types during system life.

---

### **4.1 Key-Value Stores**

* **Data Model:** Simple key → value mapping.
* **Examples:** Amazon DynamoDB, Redis, Riak.
* **Characteristics:**

  * High performance, massive scalability.
  * Eventually consistent, fault-tolerant.
  * Limited ability to model complex structures.

**Basic APIs:**

* GET(key) → value
* PUT(key, value)
* DELETE(key)
* EXECUTE(key, operation)

---

### **4.2 Document Stores**

* **Data Model:** Store data as structured/semi-structured documents (e.g., JSON, XML).
* Can represent hierarchical and complex objects.
* **Examples:** MongoDB, Couchbase.
* **Advantages:**

  * Flexible, human-readable format.
  * Supports nested fields and arrays.
* **Structure Example (JSON):**

```json
{
  "name": "John",
  "addresses": ["NY", "LA"],
  "phone": ["1234", "5678"]
}
```

---

### **4.3 Column Stores**

* **Based on:** Google BigTable model.
* **Data Organization:**

  * Data stored in **column families** (key-value).
  * Can have **super columns** (columns of columns).
  * Indexed by: row key, column key, timestamp.
* **Examples:** Google BigTable, Apache Cassandra, HBase.
* **Advantages:**

  * Handles semi-structured data efficiently.
  * Variable number of columns per row.

---

### **4.4 Graph Stores**

* **Data Model:** Nodes + edges (property graph model).
* **Properties:**

  * Nodes have attributes (properties).
  * Edges have labels/roles.
* **Advantages:**

  * Ideal for relationships and networks.
  * Supports recursive traversals and path queries.
* **Examples:** Neo4j, AllegroGraph, Titan.

---

## **5. Relational vs. Non-Relational Summary**

| Feature            | Relational (SQL)              | Non-Relational (NoSQL)         |
| ------------------ | ----------------------------- | ------------------------------ |
| **Data Model**     | Structured only               | Unstructured & Semi-structured |
| **Schema**         | Fixed                         | Dynamic/Evolving               |
| **Consistency**    | Strong (ACID)                 | Eventual (BASE)                |
| **Scalability**    | Vertical                      | Horizontal                     |
| **Integration**    | Strong enterprise integration | Designed for cloud/web apps    |
| **Query Language** | SQL                           | Varies (no standard)           |

---

## **Key Takeaways**

* **Big Data** challenges make traditional RDBMS insufficient in certain contexts.
* **NoSQL** provides flexible, scalable alternatives, trading strict ACID guarantees for performance and scalability.
* **CAP theorem** explains trade-offs in distributed systems — only 2 of C, A, P achievable at once.
* **BASE** model enables eventual consistency.
* **Four major NoSQL types**:

  1. Key-Value Stores
  2. Document Stores
  3. Column Stores
  4. Graph Stores
* Choice between SQL and NoSQL depends on **application requirements**, especially **consistency vs availability** needs.