# Introduction to Relational Model/2

## Module recap
- Basic notions of modeling introduced
    - Attributes and their Types
    - Schema and Instance
    - Keys and their Categorization
- Language for Relation Model introduced

## Relational Operators

#### Basic properties of relations
- **A relation is set.** Hence,
- **Ordering of rows / tuple is inconsequential**
- **All rows / tuples must be distinct**

## Select (σ) Operation in Relational Algebra

- **Symbol:** σ (sigma)  
- **Purpose:** To **filter rows (tuples)** from a relation based on a given condition (predicate).  
- **Input:** A relation (table)  
- **Output:** A relation containing only those tuples that satisfy the condition.

---

### Syntax:

$$
\sigma_{\text{condition}}(\text{Relation})
$$

---

### Example Relation: **Students**

| StudentID | Name    | Age | Major   |
|-----------|---------|-----|---------|
| 101       | Alice   | 22  | CS      |
| 102       | Bob     | 20  | Math    |
| 103       | Charlie | 23  | CS      |
| 104       | David   | 21  | Physics |

---

### Example 1: Using **AND (∧)** operator

Select students who are majoring in **CS** **and** whose age is greater than **21**:

$$
\sigma_{Major = 'CS' \ \land \ Age > 21}(Students)
$$

**Result:**

| StudentID | Name    | Age | Major   |
|-----------|---------|-----|---------|
| 101       | Alice   | 22  | CS      |
| 103       | Charlie | 23  | CS      |

---

### Example 2: Using **OR (∨)** operator

Select students who are majoring in **CS** **or** whose age is **20**:

$$
\sigma_{Major = 'CS' \ \lor \ Age = 20}(Students)
$$

**Result:**

| StudentID | Name    | Age | Major   |
|-----------|---------|-----|---------|
| 101       | Alice   | 22  | CS      |
| 102       | Bob     | 20  | Math    |
| 103       | Charlie | 23  | CS      |

---

### Summary:
- The **Select (σ)** operation filters rows by a condition.
- Multiple conditions can be combined using logical operators like:
  - **AND (∧)** — all conditions must be true.
  - **OR (∨)** — at least one condition must be true.

## Projection (π) Operation in Relational Algebra

- **Symbol:** π (pi)  
- **Purpose:** To **select specific columns (attributes)** from a relation, essentially reducing the relation to only those columns.  
- **Input:** A relation (table)  
- **Output:** A relation with only the specified attributes, and duplicates removed (since relations are sets).

---

### Syntax:

$$
\pi_{\text{attribute}_1, \text{attribute}_2, \ldots}(\text{Relation})
$$

---

### Example Relation: **Students**

| StudentID | Name    | Age | Major   |
|-----------|---------|-----|---------|
| 101       | Alice   | 22  | CS      |
| 102       | Bob     | 20  | Math    |
| 103       | Charlie | 23  | CS      |
| 104       | David   | 21  | Physics |

---

### Example:

To get only the **Name** and **Major** of all students:

$$
\pi_{Name, Major}(Students)
$$

**Result:**

| Name    | Major   |
|---------|---------|
| Alice   | CS      |
| Bob     | Math    |
| Charlie | CS      |
| David   | Physics |

---

### Note:
- The projection operation removes duplicate rows in the output.
- It is used when you want to retrieve only certain columns from a table.

## Union ( ∪ ) Operation in Relational Algebra

- **Symbol:** ∪  
- **Purpose:** To combine tuples from **two relations** and return all **distinct** tuples that appear in either or both relations.  
- **Input:** Two relations with the **same set of attributes (same schema)**  
- **Output:** A relation containing all tuples from both input relations, without duplicates.

---

### Requirements:
- Both relations must be **union-compatible**, meaning they have the **same number of attributes** and corresponding attributes have the **same domain (data type).**

---

### Syntax:

$$
R_1 \cup R_2
$$

Where \( R_1 \) and \( R_2 \) are union-compatible relations.

---

### Example:

Relation **Students_CS**:

| StudentID | Name    | Major |
|-----------|---------|-------|
| 101       | Alice   | CS    |
| 103       | Charlie | CS    |

Relation **Students_Math**:

| StudentID | Name  | Major |
|-----------|-------|-------|
| 102       | Bob   | Math  |
| 105       | Emma  | Math  |
| 103       | Charlie | CS  |

---

Union of **Students_CS** and **Students_Math**:

$$
Students\_CS \cup Students\_Math
$$

**Result:**

| StudentID | Name    | Major |
|-----------|---------|-------|
| 101       | Alice   | CS    |
| 102       | Bob     | Math  |
| 103       | Charlie | CS    |
| 105       | Emma    | Math  |

---

### Summary:
- The **Union** operator combines tuples from both relations.
- Duplicate tuples appear **only once** in the result.
- Both relations must have the same schema to perform union.

## Set Difference (−) Operation in Relational Algebra

- **Symbol:** −  
- **Purpose:** To return tuples that are in the **first relation** but **not in the second relation**.  
- **Input:** Two relations with the **same set of attributes (same schema)**  
- **Output:** A relation containing tuples from the first relation that do **not** appear in the second relation.

---

### Requirements:
- Both relations must be **union-compatible**, meaning they have the **same number of attributes** and corresponding attributes have the **same domain (data type).**

---

### Syntax:

$$
R_1 - R_2
$$

Where \( R_1 \) and \( R_2 \) are union-compatible relations.

---

### Example:

Relation **Students_All**:

| StudentID | Name    | Major   |
|-----------|---------|---------|
| 101       | Alice   | CS      |
| 102       | Bob     | Math    |
| 103       | Charlie | CS      |
| 104       | David   | Physics |

Relation **Students_CS**:

| StudentID | Name    | Major |
|-----------|---------|-------|
| 101       | Alice   | CS    |
| 103       | Charlie | CS    |

---

Set Difference to find students who are **not majoring in CS**:

$$
Students\_All - Students\_CS
$$

**Result:**

| StudentID | Name  | Major   |
|-----------|-------|---------|
| 102       | Bob   | Math    |
| 104       | David | Physics |

---

### Summary:
- The **Set Difference** operation subtracts tuples in the second relation from the first.
- Only tuples present in the first relation and absent in the second are returned.
- Both relations must have the same schema.

## Set Intersection ( ∩ ) Operation in Relational Algebra

- **Symbol:** ∩  
- **Purpose:** To return tuples that are **common to both relations**.  
- **Input:** Two relations with the **same set of attributes (same schema)**  
- **Output:** A relation containing only the tuples that appear in **both** input relations.

---

### Requirements:
- Both relations must be **union-compatible**, meaning they have the **same number of attributes** and corresponding attributes have the **same domain (data type).**

---

### Syntax:

$$
R_1 \cap R_2
$$

Where \( R_1 \) and \( R_2 \) are union-compatible relations.

---

### Example:

Relation **Students_CS**:

| StudentID | Name    | Major |
|-----------|---------|-------|
| 101       | Alice   | CS    |
| 103       | Charlie | CS    |
| 105       | Emma    | CS    |

Relation **Students_Scholarship**:

| StudentID | Name    | Major |
|-----------|---------|-------|
| 103       | Charlie | CS    |
| 105       | Emma    | CS    |
| 107       | Frank   | Math  |

---

Set Intersection to find students who are **both CS majors and have scholarship**:

$$
Students\_CS \cap Students\_Scholarship
$$

**Result:**

| StudentID | Name    | Major |
|-----------|---------|-------|
| 103       | Charlie | CS    |
| 105       | Emma    | CS    |

---

### Note:
The **intersection** of two relations can also be expressed using **set difference**:

$$
R_1 \cap R_2 = R_1 - (R_1 - R_2)
$$

This means we subtract from \( R_1 \) all tuples that are **not in** \( R_2 \), effectively keeping only those that are **also in** \( R_2 \).

---

### Summary:
- Returns only the common tuples between two relations.
- Can be derived using set difference:  
  $$ R \cap S = R - (R - S) $$
- Requires both relations to have the same schema.

## Cartesian Product ( ⨯ ) in Relational Algebra

- **Symbol:** ⨯  
- **Purpose:** To combine **every tuple** of one relation with **every tuple** of another relation.  
- **Input:** Two relations (can have different schemas).  
- **Output:** A new relation where each tuple is the **concatenation** of a tuple from the first relation with a tuple from the second.

---

### Syntax:

$$
R_1 \times R_2
$$

Where $ R_1 $ and $ R_2 $ are two relations.

---

### Example:

Relation **Students**:

| StudentID | Name    |
|-----------|---------|
| 1         | Alice   |
| 2         | Bob     |

Relation **Courses**:

| CourseID | CourseName |
|----------|------------|
| C1       | DBMS       |
| C2       | OS         |

---

Cartesian Product:

$$
Students \times Courses
$$

**Result:**

| StudentID | Name  | CourseID | CourseName |
|-----------|-------|----------|------------|
| 1         | Alice | C1       | DBMS       |
| 1         | Alice | C2       | OS         |
| 2         | Bob   | C1       | DBMS       |
| 2         | Bob   | C2       | OS         |

---

### Use Case:

The **Cartesian Product** is often used as an intermediate step in other operations such as **joins**.  
For example, a **natural join** or **theta join** can be implemented by applying a **Cartesian Product** followed by a **selection** operation.

---

### Summary:
- Combines all tuples from two relations.
- Results in a large relation: if $ R_1 $ has $ m $ tuples and $ R_2 $ has $ n $ tuples, the result will have $$ m \times n $$ tuples.
- Often followed by a selection to form more meaningful relationships (joins).

## Cartesian Product ( ⨯ ) — Attribute Naming Issues

When performing a **Cartesian Product**, the result contains **all attributes** from both relations.  
However, if both relations have **attributes with the same name**, it creates **naming conflicts**.

---

### Example of Naming Conflict:

Relation **Students**:

| ID | Name  |
|----|-------|
| 1  | Alice |
| 2  | Bob   |

Relation **Courses**:

| ID | Title     |
|----|-----------|
| C1 | DBMS      |
| C2 | Operating |

---

Now if we compute:

$$
Students \times Courses
$$

There will be a naming conflict for the attribute `ID`, because both relations have an `ID` column.

---

### Solution: Use Renaming

We can rename attributes using the **ρ (rho)** operator before applying the Cartesian Product:

$$
\rho(StudentID, ID, Name)(Students) \times \rho(CourseID, ID, Title)(Courses)
$$

This renames:
- `ID` in Students → `StudentID`
- `ID` in Courses → `CourseID`

Now the resulting relation will be:

| StudentID | Name  | CourseID | Title     |
|-----------|-------|----------|-----------|
| 1         | Alice | C1       | DBMS      |
| 1         | Alice | C2       | Operating |
| 2         | Bob   | C1       | DBMS      |
| 2         | Bob   | C2       | Operating |

---

### Summary:
- Cartesian Product results in all possible tuple combinations.
- **Naming conflicts** arise if both relations share attribute names.
- Use the **rename operator ρ** to avoid ambiguity in attribute names.

## Rename Operation in Relational Algebra (ρ)

The **rename operation** is used to:
- Rename an **attribute (column)** or
- Rename an **entire relation (table)**

This is especially useful:
- To avoid **naming conflicts** in operations like **Cartesian product** or **joins**
- To improve readability and clarity of expressions

---

### Symbol:  
The Greek letter **ρ** (rho)

---

### Syntax 1: Rename the entire relation  
$$
\rho(NewRelationName)(OldRelation)
$$

### Syntax 2: Rename attributes  
$$
\rho_{NewAttributeList}(OldRelation)
$$

### Syntax 3: Rename relation and attributes  
$$
\rho(NewRelationName, NewAttributeList)(OldRelation)
$$

---

### Example 1: Rename only relation

If you have a relation:

**Students**

| ID | Name |
|----|------|
| 1  | Alice |

Rename the relation:

$$
\rho(S)(Students)
$$

Now the relation is known as **S**, but still has attributes `ID`, `Name`.

---

### Example 2: Rename attributes

Rename `ID` → `StudentID`, and `Name` → `StudentName`:

$$
\rho_{(StudentID, StudentName)}(Students)
$$

Result:

| StudentID | StudentName |
|-----------|-------------|
| 1         | Alice       |

---

### Example 3: Rename both relation and attributes

$$
\rho(S, (StudentID, StudentName))(Students)
$$

Now the table is renamed to **S**, with renamed columns.

---

### Use Case:

Renaming is essential in cases where:
- You want to perform **Cartesian product** or **join** between relations having **same attribute names**
- You want to write **clearer queries** or **prepare for intermediate steps**

---

### Summary:
- `ρ` is used to **rename** either the relation, its attributes, or both.
- Useful for **avoiding conflicts** and **improving clarity** in expressions.

## Composition of Operations in Relational Algebra

### What is Composition?

- **Composition** means applying **multiple relational algebra operations** one after the other in a sequence.
- The **output** of one operation becomes the **input** to the next.
- This is similar to composing functions in mathematics.

---

### Why Use Composition?

- To express complex queries in a **step-by-step** or **combined** form.
- To **filter**, **transform**, and **combine** data effectively in a single expression.

---

### General Form:

If you want to perform operations `A`, `B`, and `C` on a relation `R`, you can write:

$$
C(B(A(R)))
$$

Here:
- `A(R)` is evaluated first.
- Then `B` is applied to the result of `A(R)`.
- Finally `C` is applied to the result of `B(A(R))`.

---

### Example:

Suppose we have a relation **Employees**:

| EmpID | Name  | Department | Salary |
|-------|-------|------------|--------|
| 1     | Alice | HR         | 40000  |
| 2     | Bob   | IT         | 60000  |
| 3     | Carol | HR         | 45000  |

---

### Task: Get names of employees in HR with salary > 42000

This involves:
1. **Selection** to filter HR employees with salary > 42000  
2. **Projection** to get only the `Name` column

---

### Relational Algebra Expression:

$$
\pi_{Name}(\sigma_{Department='HR' \land Salary > 42000}(Employees))
$$

**Step-by-step:**
1. $ \sigma_{Department='HR' \land Salary > 42000}(Employees) $ → Filters rows
2. $ \pi_{Name}(...) $ → Projects only the Name column

---

### Result:

| Name  |
|-------|
| Carol |

---

### Summary:
- **Composition** allows chaining operations to express complex queries.
- Follow the **inside-out** rule when evaluating.

## Natural Join with Two Common Columns

### What is Natural Join?

A **Natural Join (⨝)** matches rows from two relations **based on all attributes with the same name** and keeps **one copy** of each common attribute in the result.

---

### When Two Columns are Common

If two relations share **two attribute names**, the **natural join condition** will match tuples **only when both common attributes are equal**.

---

### Example:

#### Table 1: `Sales`

| EmpID | Product | Region | Month |
|-------|---------|--------|-------|
| 1     | Pen     | East   | Jan   |
| 2     | Pencil  | West   | Feb   |
| 3     | Eraser  | East   | Mar   |

#### Table 2: `Targets`

| Region | Month | Target |
|--------|-------|--------|
| East   | Jan   | 1000   |
| East   | Mar   | 800    |
| West   | Jan   | 900    |

---

### Natural Join Query:

$$
Sales \ ⨝ \ Targets
$$

**Common columns:** `Region` and `Month`

---

### Result:

| EmpID | Product | Region | Month | Target |
|-------|---------|--------|-------|--------|
| 1     | Pen     | East   | Jan   | 1000   |
| 3     | Eraser  | East   | Mar   | 800    |

---

### Explanation:

- Only rows where **both `Region` and `Month` match** are included.
- Columns `Region` and `Month` appear only once in the final output.
- `Pencil` (West, Feb) is excluded because no match is found in `Targets`.

---

### Summary:
- Natural Join with **multiple common attributes** ensures **strict matching on all**.
- It's a powerful way to combine related data with minimal redundancy.

## Logical Steps of Natural Join Applied to Sales and Targets Example Without using of Join sign

### Given Tables:

**Sales**

| EmpID | Product | Region | Month |
|-------|---------|--------|-------|
| 1     | Pen     | East   | Jan   |
| 2     | Pencil  | West   | Feb   |
| 3     | Eraser  | East   | Mar   |

**Targets**

| Region | Month | Target |
|--------|-------|--------|
| East   | Jan   | 1000   |
| East   | Mar   | 800    |
| West   | Jan   | 900    |

---

### Step 1: Cartesian Product

Combine every row of Sales with every row of Targets:

| EmpID | Product | Sales.Region | Sales.Month | Targets.Region | Targets.Month | Target |
|-------|---------|--------------|-------------|----------------|---------------|--------|
| 1     | Pen     | East         | Jan         | East           | Jan           | 1000   |
| 1     | Pen     | East         | Jan         | East           | Mar           | 800    |
| 1     | Pen     | East         | Jan         | West           | Jan           | 900    |
| 2     | Pencil  | West         | Feb         | East           | Jan           | 1000   |
| 2     | Pencil  | West         | Feb         | East           | Mar           | 800    |
| 2     | Pencil  | West         | Feb         | West           | Jan           | 900    |
| 3     | Eraser  | East         | Mar         | East           | Jan           | 1000   |
| 3     | Eraser  | East         | Mar         | East           | Mar           | 800    |
| 3     | Eraser  | East         | Mar         | West           | Jan           | 900    |

---

### Step 2: Selection

Keep only rows where `Sales.Region = Targets.Region` **AND** `Sales.Month = Targets.Month`:

| EmpID | Product | Region | Month | Region | Month | Target |
|-------|---------|--------|-------|--------|-------|--------|
| 1     | Pen     | East   | Jan   | East   | Jan   | 1000   |
| 3     | Eraser  | East   | Mar   | East   | Mar   | 800    |

---

### Step 3: Projection

Remove duplicate columns (`Targets.Region` and `Targets.Month`) to avoid redundancy:

| EmpID | Product | Region | Month | Target |
|-------|---------|--------|-------|--------|
| 1     | Pen     | East   | Jan   | 1000   |
| 3     | Eraser  | East   | Mar   | 800    |

---

### Final Result

This matches the natural join output: rows where both `Region` and `Month` values are equal in both tables, with no duplicate columns.

---

### Summary:

Natural Join Logic for this example is:

$$
\pi_{EmpID, Product, Region, Month, Target} \Big( \sigma_{Sales.Region = Targets.Region \land Sales.Month = Targets.Month} (Sales \times Targets) \Big)
$$

## Aggregate Operators in Relational Algebra

### What are Aggregate Operators?

Aggregate operators perform **computations on a set of tuples** (rows) to return a **single summarized value**.  
They are used to compute summaries like totals, averages, counts, etc.

---

### Common Aggregate Operators:

| Operator | Meaning                       | Example Use                          |
|----------|-------------------------------|------------------------------------|
| **COUNT**  | Counts the number of tuples     | Count how many employees are there |
| **SUM**    | Adds up values of an attribute  | Sum of all sales amounts            |
| **AVG**    | Computes average value           | Average salary                      |
| **MIN**    | Finds minimum value              | Minimum age                        |
| **MAX**    | Finds maximum value              | Maximum score                     |

---

### Example:

Assume a relation `Sales` with attributes: `EmpID`, `Product`, `Amount`

| EmpID | Product | Amount |
|-------|---------|--------|
| 1     | Pen     | 100    |
| 2     | Pencil  | 150    |
| 1     | Eraser  | 200    |

- **COUNT**: Number of sales made  
  $$ \text{COUNT}(Sales) = 3 $$

- **SUM**: Total sales amount  
  $$ \text{SUM}_{Amount}(Sales) = 100 + 150 + 200 = 450 $$

- **AVG**: Average sales amount  
  $$ \text{AVG}_{Amount}(Sales) = \frac{450}{3} = 150 $$

- **MIN**: Minimum sales amount  
  $$ \text{MIN}_{Amount}(Sales) = 100 $$

- **MAX**: Maximum sales amount  
  $$ \text{MAX}_{Amount}(Sales) = 200 $$

---

### Grouping with Aggregate Operators

Aggregate operators can be combined with **grouping** to calculate values per group.

For example, calculate total sales per employee:

| EmpID | Total_Sales |
|-------|-------------|
| 1     | 300         |
| 2     | 150         |

This is often expressed as:

```sql
SELECT EmpID, SUM(Amount)
FROM Sales
GROUP BY EmpID;

## Aggregate Operators with Condition Example

### Relation `Sales`:

| EmpID | Product | Amount |
|-------|---------|--------|
| 1     | Pen     | 100    |
| 2     | Pencil  | 150    |
| 1     | Eraser  | 200    |

---

### Task: Calculate the sum of `Amount` **only for sales greater than 100**

---

### Step 1: Selection (filter rows with `Amount > 100`)

$$
S = \sigma_{Amount > 100}(Sales)
$$

Filtered relation `S`:

| EmpID | Product | Amount |
|-------|---------|--------|
| 2     | Pencil  | 150    |
| 1     | Eraser  | 200    |

---

### Step 2: Apply SUM aggregate on filtered relation

$$
\text{SUM}_{Amount}(S) = 150 + 200 = 350
$$

---

### Summary:

- First filter tuples where `Amount > 100`.
- Then calculate the sum of the filtered `Amount` values.

---

### SQL equivalent:

```sql
SELECT SUM(Amount)
FROM Sales
WHERE Amount > 100;

### Combining Selection and Aggregation in One Step

#### Goal:
Calculate the sum of `Amount` from `Sales` where `Amount > 100`, **in one operation**.

---

#### Relational Algebra Expression:

$$
\text{SUM}_{Amount} \big( \sigma_{Amount > 100} (Sales) \big)
$$

- First, select rows with `Amount > 100`.
- Then, aggregate the `Amount` attribute with SUM.

---

#### SQL Equivalent:

```sql
SELECT SUM(Amount)
FROM Sales
WHERE Amount > 100;

## Relational Languages

Relational languages are used to **define, manipulate, and query data** stored in relational databases.

---

### Types of Relational Languages:

1. **Relational Algebra (Procedural)**
   - A formal set of operations on relations (tables).
   - Specifies *how* to retrieve data.
   - Uses operations like selection, projection, union, difference, Cartesian product, join, etc.
   - Example:  
     $$
     \sigma_{Age > 25}(Employees)
     $$
     selects employees older than 25.

2. **Relational Calculus (Declarative)**
   - Specifies *what* data to retrieve rather than how.
   - Based on mathematical logic.
   - Two types:
     - **Tuple Relational Calculus (TRC)**
     - **Domain Relational Calculus (DRC)**
   - Example (TRC):  
     $$
     \{ t \mid t \in Employees \land t.Age > 25 \}
     $$
     means "find all tuples t in Employees where Age > 25."

3. **SQL (Structured Query Language)**
   - Industry-standard declarative language.
   - Allows data definition, manipulation, and control.
   - Combines concepts of relational algebra and calculus.
   - Example:  
     ```sql
     SELECT * FROM Employees WHERE Age > 25;
     ```

---

### Procedural vs Declarative:

| Aspect                | Procedural (Relational Algebra)          | Declarative (Relational Calculus, SQL)      |
|-----------------------|-----------------------------------------|---------------------------------------------|
| What to specify?      | How to get the result                    | What result is needed                        |
| Programmer's task     | Specify step-by-step operations         | Specify conditions or properties            |
| Example              | Use selection, projection, join etc.     | Use logical predicates or SQL SELECT query |

---

### Summary:

- **Relational Algebra** is a procedural query language with a fixed set of operations.
- **Relational Calculus** is a declarative language based on logic.
- **SQL** is the most widely used practical language based on declarative concepts.
- Understanding relational languages helps in writing efficient queries and designing databases.

## Summary of Relational Algebra Operators

Relational Algebra provides a set of fundamental operations to manipulate and query relations (tables) in a database.

| Operator             | Symbol          | Description                                    | Example Use                             |
|----------------------|-----------------|------------------------------------------------|---------------------------------------|
| **Selection**        | $ \sigma $    | Selects rows (tuples) that satisfy a condition | $ \sigma_{Age > 25}(Employees) $    |
| **Projection**       | $ \pi $      | Selects specific columns (attributes)           | $ \pi_{Name, Age}(Employees) $      |
| **Union**            | $ \cup $      | Combines tuples from two relations (no duplicates) | $ R \cup S $                      |
| **Set Difference**   | $ -$        | Tuples in one relation but not in the other      | $ R - S $                          |
| **Intersection**     | $ \cap $      | Tuples common to both relations                    | $ R \cap S $                      |
| **Cartesian Product**| $ \times $    | Combines every tuple of one relation with every tuple of another | $ R \times S $              |
| **Rename**           | $ \rho $      | Renames relation or attributes                     | $ \rho_{NewName}(R) $               |
| **Join**             | $\bowtie$ | Combines related tuples from two relations based on a condition | Natural join: $ R \bowtie S $     |

---

### Notes:
- **Selection and Projection** are unary operators (operate on one relation).
- **Union, Difference, Intersection, Cartesian Product, and Join** are binary operators (operate on two relations).
- Join is a key operator used to combine relations based on common attributes.