<a href="https://colab.research.google.com/github/ankitarm/SQL_Data_Engineer/blob/main/SQL_Theory.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### SQL commands?
| SQL Command Type | Description                                       | Examples                             |
|------------------|---------------------------------------------------|--------------------------------------|
| DDL (Data Definition Language) | Defines database schema and structure        | `CREATE`, `ALTER`, `DROP`, `TRUNCATE` |
| DML (Data Manipulation Language) | Manages data within tables                  | `SELECT`, `INSERT`, `UPDATE`, `DELETE` |
| DCL (Data Control Language) | Controls access to data                     | `GRANT`, `REVOKE`                    |
| TCL (Transaction Control Language) | Manages transactions and changes           | `COMMIT`, `ROLLBACK`, `SAVEPOINT`    |
| DQL (Data Query Language) | Retrieves data from the database             | `SELECT`                             |


###CHAR and VARCHAR data types?
| Feature            | CHAR                         | VARCHAR                          |
|--------------------|------------------------------|----------------------------------|
| Storage Length     | Fixed-length                 | Variable-length                  |
| Padding            | Pads with spaces to full length | No padding                       |
| Performance        | Faster for fixed-size data   | More efficient for varying sizes |
| Storage Usage      | Always uses defined size     | Uses only required space         |
| Use Case           | Best for consistent-length data (e.g., codes) | Best for varying-length data (e.g., names) |
| Example            | `CHAR(5)` stores `'USA  '` (padded with 2 spaces)      | `VARCHAR(5)` stores `'USA'` (uses only 3 characters)      |
| SQL Example        | `CREATE TABLE Country (code CHAR(5));`                | `CREATE TABLE Person (name VARCHAR(50));`                |

### Primary key?
a column or a set of columns that uniquely identifies
each row in a table. It must contain
unique values and cannot contain NULL values.

### WHERE and HAVING clauses.

---

| Feature           | WHERE Clause                                 | HAVING Clause                                 |
|-------------------|-----------------------------------------------|------------------------------------------------|
| Purpose           | Filters rows **before** grouping or aggregation | Filters groups **after** aggregation          |
| Applies To        | Individual rows                               | Aggregated/grouped data                       |
| Used With         | `SELECT`, `UPDATE`, `DELETE`                  | Only with `SELECT` and **GROUP BY**           |
| Aggregate Functions | Cannot use aggregate functions (like SUM, AVG) | Can use aggregate functions                  |
| Execution Order   | Applied **first** (before GROUP BY)           | Applied **after GROUP BY and aggregation**    |
| Example Usage     | `WHERE age > 30`                              | `HAVING COUNT(*) > 2`                         |


---

```sql
-- Using WHERE to filter rows
SELECT * FROM employees
WHERE department = 'Sales';

-- Using HAVING to filter groups
SELECT department, COUNT(*)
FROM employees
GROUP BY department
HAVING COUNT(*) > 10;
```

How SQL Executes This:
```
FROM ‚Üí Get data from tables
WHERE ‚Üí Filter rows
GROUP BY ‚Üí Group the filtered rows
HAVING ‚Üí Filter the grouped results
SELECT ‚Üí Choose the columns to return
ORDER BY / LIMIT ‚Üí Final sorting and limiting
```

### Stored procedure?

A stored procedure is a
precompiled collection of SQL
statements that can be stored and
executed in a database. It allows for code
reusability and improved performance.

### UNION and UNION ALL.

UNION is used to combine the
results of two or more SELECT
statements into a single result set,
eliminating duplicate rows. UNION ALL
also combines the results of SELECT
statements, but it retains all rows,
including duplicates.

### Normalization?
Normalization is the process of
organizing data in a database to minimize
redundancy and dependency by dividing
large tables into smaller tables and
defining relationships between them.

### Denormalization?
Denormalization is the process
of adding redundant data to a
normalized database to improve read
performance by reducing the number of
joins needed to retrieve data.

### ACID properties.
| Property      | Description                                                                 |
|---------------|-----------------------------------------------------------------------------|
| **A - Atomicity**   | Ensures that a transaction is **all or nothing** ‚Äî either all operations succeed, or none do. |
| **C - Consistency** | Ensures the database remains in a **valid state** before and after the transaction.        |
| **I - Isolation**   | Ensures that concurrent transactions **do not interfere** with each other.               |
| **D - Durability**  | Once a transaction is committed, the **changes are permanent**, even in case of a crash. |


 üí° Transaction Steps (in SQL):

```sql
BEGIN TRANSACTION;

-- Step 1: Deduct $100 from Account A
UPDATE accounts
SET balance = balance - 100
WHERE account_id = 'A';

-- Step 2: Add $100 to Account B
UPDATE accounts
SET balance = balance + 100
WHERE account_id = 'B';

-- Step 3: Commit the transaction
COMMIT;
```

---

 üîÑ Transaction Flow Diagram

```markdown
Start Transaction
     |
     v
[ Deduct $100 from A ]      --> (Atomicity: If this fails, everything rolls back)
     |
     v
[ Add $100 to B ]           --> (Consistency: Ensures total funds are correct)
     |
     v
[ COMMIT ]                  --> (Durability: Changes are now permanent)
     |
     v
[ Other users see changes ] --> (Isolation: No user sees partial changes)
```

---

 üìã ACID Explained with This Example

| Property        | How It Applies in the Transfer Scenario                                   |
| --------------- | ------------------------------------------------------------------------- |
| **Atomicity**   | If either deduction or addition fails, **both changes are rolled back**.  |
| **Consistency** | The **total money remains consistent** in the system after the transfer.  |
| **Isolation**   | Other transactions can‚Äôt see the transfer **until it‚Äôs fully committed**. |
| **Durability**  | Even if the system crashes right after commit, **changes are saved**.     |


### VIEW and a materialized view
```markdown
| Feature               | View                                      | Materialized View                          |
|------------------------|-------------------------------------------|---------------------------------------------|
| Definition             | Virtual table based on a SQL query        | Physical table storing query results        |
| Storage                | Does **not** store data                   | **Stores** data physically                  |
| Data Freshness         | Always reflects latest data               | Needs to be **refreshed** manually or on schedule |
| Performance            | Slower for large data (recomputes each time) | Faster as data is precomputed              |
| Update Frequency       | Always current                            | Can be **stale** until refreshed            |
| Use Case               | Real-time querying, lightweight reporting | Caching heavy queries, performance boost    |
| Maintenance            | No maintenance required                   | Requires **refresh strategy** (manual/on commit/on schedule) |
```

---

 üí° Example:

 üîπ View

```sql
CREATE VIEW active_users AS
SELECT * FROM users WHERE status = 'active';
```

 üîπ Materialized View (in PostgreSQL or Oracle)

```sql
CREATE MATERIALIZED VIEW active_users_mv AS
SELECT * FROM users WHERE status = 'active';
```

To refresh:

```sql
REFRESH MATERIALIZED VIEW active_users_mv;
```

### CLUSTERED and NON-CLUSTERED index?
 A clustered index determines
 the physical order of rows in a table and
 is created on the primary key column(s)
 by default. A non-clustered index does
 not affect the physical order of rows and
 is stored separately from the table data.
 Great question! Let‚Äôs break down the difference between **CLUSTERED** and **NON-CLUSTERED** indexes in a way that‚Äôs super easy to understand.

---

 üìö Real-World Analogy: Think of a Book

* **CLUSTERED INDEX** = The **main content pages**, sorted in a specific order.
* **NON-CLUSTERED INDEX** = The **index pages at the back** of a book, pointing to specific pages.

---

 ‚úÖ Google Colab Markdown Table

```markdown
| Feature              | CLUSTERED INDEX                               | NON-CLUSTERED INDEX                           |
|----------------------|------------------------------------------------|------------------------------------------------|
| Data Storage         | **Stores actual data** in sorted order         | Stores **pointers** to the actual data         |
| Number Allowed       | Only **one** per table                         | Can have **multiple**                         |
| Sorting              | Data is physically sorted by this index        | Doesn‚Äôt affect data‚Äôs physical order           |
| Speed                | Faster for **range queries**                   | Faster for **point lookups** (exact matches)   |
| Example Use Case     | Sorting by `OrderDate` in an orders table      | Searching for users by `Email` or `Phone`      |
```

---

 üí° Example in SQL

 üìå Clustered Index (usually created by default on the **Primary Key**):

```sql
CREATE TABLE employees (
    emp_id INT PRIMARY KEY, -- This is a clustered index
    name VARCHAR(100),
    department VARCHAR(50)
);
```

 üîç Non-Clustered Index:

```sql
CREATE NONCLUSTERED INDEX idx_dept
ON employees(department);
```

---
 üß† Summary:

* **CLUSTERED INDEX**: Sorts and stores the actual rows of data in the table. Only one allowed.
* **NON-CLUSTERED INDEX**: Has a separate structure that points to the rows. You can have many of these.



###DELETE and TRUNCATE commands?
 DELETE is a DML command
 used to remove rows from a table based
 on a specified condition, while
 TRUNCATE is a DDL command used to
 remove all rows from a table without
 generating a log of individual row
 deletions

### VIEW and a TABLE?
A table is a physical structure
 that stores data, while a view is a virtual
 table that does not store data itself but
 provides a way to present data stored in
 one or more tables

### TRIGGER?
 A trigger is a database object
 that automatically executes a specified
 set of SQL statements when certain
 events occur, such as inserting, updating,
 or deleting rows from a table

### UNIQUE key and a PRIMARY key?

No worries! Let me explain in a **very simple way** with **real-world examples** so you can easily understand the **difference between a PRIMARY KEY and a UNIQUE key**.

---

 üì¶ Think of a Table Like a Classroom

Imagine a table like a list of students in a classroom.

| student\_id | name    | email                                         |
| ----------- | ------- | --------------------------------------------- |
| 1           | Alice   | [alice@email.com](mailto:alice@email.com)     |
| 2           | Bob     | [bob@email.com](mailto:bob@email.com)         |
| 3           | Charlie | [charlie@email.com](mailto:charlie@email.com) |

---

 üîë PRIMARY KEY

* It's like a **roll number**.
* It **must be unique** ‚Äî no two students can have the same roll number.
* It **cannot be empty** ‚Äî every student **must have** a roll number.

üí° So `student_id` is a **PRIMARY KEY**.

---

 ‚úÖ UNIQUE KEY

* It's like an **email address**.
* It should also be **unique** ‚Äî no two students should have the same email.
* But it's **OK if someone doesn't have an email** ‚Äî meaning it can be **NULL**.

üí° So `email` can be a **UNIQUE KEY**.

---

 üîÑ Main Differences

| Feature        | PRIMARY KEY                  | UNIQUE KEY                    |
| -------------- | ---------------------------- | ----------------------------- |
| Must be unique | ‚úÖ Yes                        | ‚úÖ Yes                         |
| Can be NULL?   | ‚ùå No                         | ‚úÖ Yes                         |
| How many?      | Only **one** per table       | Can have **many**             |
| Main use       | Identifies each row uniquely | Enforces uniqueness on column |

---

 üìå Summary

* Use **PRIMARY KEY** to uniquely identify each row (like `student_id`).
* Use **UNIQUE KEY** to make sure no duplicates exist (like email), but it's optional (can be NULL).


###correlated and a non-correlated subquery?
A correlated subquery is a
 subquery that depends on the outer
 query for its values, while a non
correlated subquery can be executed
 independently of the outer query.


 ### COMMIT and ROLLBACK commands?

 Here's a simple explanation of the **COMMIT** and **ROLLBACK** commands in SQL, with a **Google Colab Markdown** table format:

```markdown
| Command   | Purpose                                                                 |
|-----------|-------------------------------------------------------------------------|
| COMMIT    | Saves all the changes made in the current transaction **permanently** to the database. |
| ROLLBACK  | **Undoes** all changes made in the current transaction, restoring the previous state.   |
```

---

 ‚úÖ When to Use:

* **Use `COMMIT`** when:

  * All your operations are successful.
  * You‚Äôre confident the data is correct and ready to be saved.
* **Use `ROLLBACK`** when:

  * An error occurs mid-transaction.
  * You want to cancel all operations performed so far.

---

 üí° Example:

```sql
BEGIN TRANSACTION;

UPDATE accounts
SET balance = balance - 100
WHERE account_id = 'A';

-- Suppose an error happens here (e.g., balance too low or B doesn't exist)

UPDATE accounts
SET balance = balance + 100
WHERE account_id = 'B';

-- If everything is okay
COMMIT;

-- If something fails
ROLLBACK;
```

 ### COALESCE()

The **`COALESCE()`** function in SQL is used to **return the first non-NULL value** from a list of expressions.

---
 üß† **Purpose**:

To **handle NULL values** by providing a default or fallback value in queries.

---

 ‚úÖ **Google Colab Markdown Table**:

```markdown
| Feature         | Description                                                              |
|-----------------|---------------------------------------------------------------------------|
| Function Name   | COALESCE()                                                                |
| Purpose         | Returns the **first non-NULL** value from the list of arguments           |
| Common Use Case | Replacing NULLs with default values in SELECT statements                 |
| Returns         | NULL only if **all** arguments are NULL                                   |
```

---

 üí° **Syntax**:

```sql
COALESCE(expression1, expression2, ..., expressionN)
```

---

 üîπ **Example**:

 SQL Query:

```sql
SELECT name, COALESCE(phone, 'No Phone') AS contact_info
FROM employees;
```

 Meaning:

* If `phone` is `NULL`, it will return `"No Phone"` instead.
* If `phone` has a value, it returns that value.

---
